metadata

license: llama3.2
library_name: transformers
base_model:
  - meta-llama/Llama-3.2-3B
pipeline_tag: text-generation

Cogito v1 preview - 3B

NOTE

The model weights may be updated by Sunday, April 7th. However, these weights will just be a later checkpoint of the model currently being trained.
The base model (and therefore the model architecture) will remain the same. Similarly, the tokenizer will remain unchanged, as well as how to enable reasoning.
The complete description will be uploaded along with the evals results in the next few days.

Introduction

The Cogito LLMs are instruction tuned generative models (text in/text out). All models are released under an open license for commercial use.

Cogito models are hybrid reasoning models. You can pick when you want the model to answer normally and when you want it to think longer before answering.
They have significantly higher multilingual, coding and tool calling capabilities than their counterparts, and have been optimized for coding, STEM, instruction following, and general helpfulness.
Early testing demonstrates that Cogito v1-preview models significantly outperform their size equivalent counterparts on common industry benchmarks in the standard mode.
Similarly, in the reasoning mode, Cogito v1-preview models outperform their size equivalent reasoning model counterparts on common industry benchmarks.

Implementing extended thinking

This section will walk through how to use Cogito models to enable extended thinking (i.e., reasoning mode).

By default, the model will answer in the standard mode.
To enable thinking, you can do any one of the two methods:
- Add a specific system prompt, or
- Set enable_thinking=True during tokenization.

Method 1 - Add a specific system prompt.

To enable thinking, simply use this in the system prompt system_instruction = 'Enable deep thinking subroutine.'

If you already have a system_instruction, then use system_instruction = 'Enable deep thinking subroutine.' + '\n\n' + system_instruction.

Here is an example -

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "deepcogito/cogito-v1-preview-llama-3B"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

DEEP_THINKING_INSTRUCTION = "Enable deep thinking subroutine."
prompt = "Write a bash script that takes a matrix represented as a string with format '[1,2],[3,4],[5,6]' and prints the transpose in the same format."

messages = [
    {"role": "system", "content": DEEP_THINKING_INSTRUCTION},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Similarly, if you have a system prompt, you can append the DEEP_THINKING_INSTRUCTION to the beginning in this way -

DEEP_THINKING_INSTRUCTION = "Enable deep thinking subroutine."

system_prompt = "Reply to each prompt with only code answers - no explanations."
prompt = "Write a bash script that takes a matrix represented as a string with format '[1,2],[3,4],[5,6]' and prints the transpose in the same format."


messages = [
    {"role": "system", "content": DEEP_THINKING_INSTRUCTION + '\n\n' + system_prompt},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

Method 2 - Set enable_thinking=True in the tokenizer

If you are using Huggingface tokenizers, then you can simply use add the argument enable_thinking=True to the tokenization (this option is added to the chat template.) Here is an example -

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "deepcogito/cogito-v1-preview-llama-3B"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Write a bash script that takes a matrix represented as a string with format '[1,2],[3,4],[5,6]' and prints the transpose in the same format."

messages = [{"role": "user", "content": prompt}]

# Add enable_thinking=True for thinking mode.
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]