Update tokenizer_config.json

#11

by erichartford - opened May 28

base: refs/heads/main

←

from: refs/pr/11

Discussion Files changed

-1

erichartford

May 28

•

edited May 28

add "{% if enable_thinking is defined and enable_thinking is false %}{{'<think>\\n\\n</think>\\n\\n'}}{% endif %}"

Add support for empty think block injection in chat template

Description

This PR adds support for the enable_thinking parameter in the chat template to control chain-of-thought reasoning, achieving feature parity with Qwen3.

Why it's needed

Many inference frameworks (SGLang, vLLM) and applications need to control whether models use reasoning steps. The enable_thinking parameter provides a standardized way to:

Improve inference speed when reasoning isn't needed
Ensure consistent output structure for parsing
Match behavior across different model families

Usage

# With thinking enabled (default behavior - unchanged)
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True  # or omit for default
)
# Output: <｜Assistant｜>

# With thinking disabled (new behavior)
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False
)
# Output: <｜Assistant｜><think>\n\n</think>\n\n

Implementation

The change adds a single line to inject an empty think block when enable_thinking=False:

{% if enable_thinking is defined and enable_thinking is false %}{{'<think>\n\n</think>\n\n'}}{% endif %}

This follows Qwen3's approach where:

enable_thinking=False strictly disables reasoning by injecting an empty think block
The empty block signals to the model to skip chain-of-thought generation
Recommended for efficiency-critical scenarios

Backward Compatibility

Fully backward compatible - only affects behavior when enable_thinking=False is explicitly set.

Update tokenizer_config.jsondedd71cb

Fizzarolli

May 28

•

edited May 28

deepseek api doesnt support this and r1 has likely not been trained with this in mind. why PR something that might not even work, especially when youre outside of the org?

besides, r1 is a sole reasoning model, v3 is a sole chat/instruct model, they're not combined (not in this iteration at least, if they were solid chance they would've been named something else)

ehartford

about 1 month ago

deepseek api doesnt support this and r1 has likely not been trained with this in mind. why PR something that might not even work, especially when youre outside of the org?

besides, r1 is a sole reasoning model, v3 is a sole chat/instruct model, they're not combined (not in this iteration at least, if they were solid chance they would've been named something else)

No.