Update tokenizer_config.json

#11

add "{% if enable_thinking is defined and enable_thinking is false %}{{'<think>\\n\\n</think>\\n\\n'}}{% endif %}"

Add support for empty think block injection in chat template

Description

This PR adds support for the enable_thinking parameter in the chat template to control chain-of-thought reasoning, achieving feature parity with Qwen3.

Why it's needed

Many inference frameworks (SGLang, vLLM) and applications need to control whether models use reasoning steps. The enable_thinking parameter provides a standardized way to:

  • Improve inference speed when reasoning isn't needed
  • Ensure consistent output structure for parsing
  • Match behavior across different model families

Usage

# With thinking enabled (default behavior - unchanged)
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True  # or omit for default
)
# Output: <|Assistant|>

# With thinking disabled (new behavior)
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False
)
# Output: <|Assistant|><think>\n\n</think>\n\n

Implementation

The change adds a single line to inject an empty think block when enable_thinking=False:

{% if enable_thinking is defined and enable_thinking is false %}{{'<think>\n\n</think>\n\n'}}{% endif %}

This follows Qwen3's approach where:

  • enable_thinking=False strictly disables reasoning by injecting an empty think block
  • The empty block signals to the model to skip chain-of-thought generation
  • Recommended for efficiency-critical scenarios

Backward Compatibility

Fully backward compatible - only affects behavior when enable_thinking=False is explicitly set.

deepseek api doesnt support this and r1 has likely not been trained with this in mind. why PR something that might not even work, especially when youre outside of the org?

besides, r1 is a sole reasoning model, v3 is a sole chat/instruct model, they're not combined (not in this iteration at least, if they were solid chance they would've been named something else)

deepseek api doesnt support this and r1 has likely not been trained with this in mind. why PR something that might not even work, especially when youre outside of the org?

besides, r1 is a sole reasoning model, v3 is a sole chat/instruct model, they're not combined (not in this iteration at least, if they were solid chance they would've been named something else)

No.

do you get good inference results with this change @ehartford ?

Let me borrow a mi300x node to test it

erichartford changed pull request status to closed

Sign up or log in to comment