The 4 Things Qwen-3’s Chat Template Teaches Us

Published April 30, 2025
Update on GitHub

What a boring Jinja snippet tells us about the new Qwen-3 model.

The new Qwen-3 model by Qwen ships with a much more sophisticated chat template than it's predecessors Qwen-2.5 and QwQ. By taking a look at the differences in the Jinja template, we can find interesting insights into the new model.

Chat Templates

What is a Chat Template?

A chat template defines how conversations between users and models are structured and formatted. The template acts as a translator, converting a human-readable conversation:

  [
    { role: "user", content: "Hi there!" },
    { role: "assistant", content: "Hi there, how can I help you today?" },
    { role: "user", content: "I'm looking for a new pair of shoes." },
  ]

into a model friendly format:

<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Hi there, how can I help you today?<|im_end|>
<|im_start|>user
I'm looking for a new pair of shoes.<|im_end|>
<|im_start|>assistant
<think>

</think>

You can easily view the chat template for a given model on the Hugging Face model page.

Chat Template for Qwen/Qwen3-235B-A22B

Let's dive into the Qwen-3 chat template and see what we can learn!

1. Reasoning doesn't have to be forced

and you can make it optional via a simple prefill...

Qwen-3 is unique in it's ability to toggle reasoning via the enable_thinking flag. When set to false, the template inserts an empty <think></think> pair, telling the model to skip step‑by‑step thoughts. Earlier models baked the <think> tag into every generation, forcing chain‑of‑thought whether you wanted it or not.

{# Qwen-3 #}
{%- if enable_thinking is defined and enable_thinking is false %}
    {{- '<think>\n\n</think>\n\n' }}
{%- endif %}

QwQ for example, forces reasoning in every conversation.

{# QwQ #}
{%- if add_generation_prompt %}
    {{- '<|im_start|>assistant\n<think>\n' }}
{%- endif %}

If the enable_thinking is true, the model is able to decide whether to think or not.

You can test test out the template with the following code:

import { Template } from "@huggingface/jinja";
import { downloadFile } from "@huggingface/hub";

const HF_TOKEN = process.env.HF_TOKEN;

const file = await downloadFile({
  repo: "Qwen/Qwen3-235B-A22B",
  path: "tokenizer_config.json",
  accessToken: HF_TOKEN,
});
const config = await file!.json();

const template = new Template(config.chat_template);
const result = template.render({
  messages,
  add_generation_prompt: true,
  enable_thinking: false,  
  bos_token: config.bos_token,
  eos_token: config.eos_token,
});

2. Context Management Should be Dynamic

Qwen-3 utilizes a rolling checkpoint system, intelligently preserving or pruning reasoning blocks to maintain relevant context. Older models discarded reasoning prematurely to save tokens.

Qwen-3 introduces a "rolling checkpoint" by traversing the message list in reverse to find the latest user turn that wasn’t a tool call. For any assistant replies after that index it keeps the full <think> blocks; everything earlier is stripped out.

Why this matters:

  • Keeps the active plan visible during a multi‑step tool call.
  • Supports nested tool workflows without losing context.
  • Saves tokens by pruning thoughts the model no longer needs.
  • Prevents "stale" reasoning from bleeding into new tasks.

Example

Here's an example of chain-of-thought preservation through tool calls with Qwen-3 and QwQ. image/png

Check out @huggingface/jinja for testing out the chat templates

3. Tool Arguments Need Better Serialization

Before, every tool_call.arguments field was piped through | tojson, even if it was already a JSON‑encoded string—risking double‑escaping. Qwen‑3 checks the type first and only serializes when necessary.

{# Qwen3 #}
{%- if tool_call.arguments is string %}
    {{- tool_call.arguments }}
{%- else %}
    {{- tool_call.arguments | tojson }}
{%- endif %}

4. There's No Need for a Default System Prompt

Like many models, the Qwen‑2.5 series has a default system prompt.

You are Qwen, created by Alibaba Cloud. You are a helpful assistant.

This is pretty common as it helps models respond to user questions like "Who are you?".

Qwen-3 and QwQ ship without this default system prompt. Despite this, the model can still accurately identify its creator if you ask it.

Conclusion

Qwen-3 shows us that through the chat_template we can provide better flexibility, smarter context handling, and improved tool interaction. These improvements not only improve capabilities, but also make agentic workflows more reliable and efficent.

Community

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment