---
license: llama3.2
library_name: transformers
base_model:
- meta-llama/Llama-3.2-3B
pipeline_tag: text-generation
---

# Cogito v1 preview - 3B

## NOTE
- The model weights may be updated by Sunday, April 7th. However, these weights will just be a later checkpoint of the model currently being trained.
- The base model (and therefore the model architecture) will remain the same. Similarly, the tokenizer will remain unchanged, as well as how to enable reasoning.
- The complete description will be uploaded along with the evals results in the next few days.

## Introduction
The Cogito LLMs are instruction tuned generative models (text in/text out). All models are released under an open license for commercial use.

- Cogito models are hybrid reasoning models. You can pick when you want the model to answer normally and when you want it to think longer before answering.
- They have significantly higher multilingual, coding and tool calling capabilities than their counterparts, and have been optimized for coding, STEM, instruction following, and general helpfulness. 
- Early testing demonstrates that Cogito v1-preview models significantly outperform their size equivalent counterparts on common industry benchmarks in the standard mode. 
- Similarly, in the reasoning mode, Cogito v1-preview models outperform their size equivalent reasoning model counterparts on common industry benchmarks.

## Implementing extended thinking
This section will walk through how to use Cogito models to enable extended thinking (i.e., reasoning mode).

- By default, the model will answer in the standard mode. 
- To enable thinking, you can do any one of the two methods:
  - Add a specific system prompt, or 
  - Set `enable_thinking=True` during tokenization.

### Method 1 - Add a specific system prompt.
To enable thinking, simply use this in the system prompt `system_instruction = 'Enable deep thinking subroutine.'`

If you already have a system_instruction, then use `system_instruction = 'Enable deep thinking subroutine.' + '\n\n' + system_instruction`.

Here is an example - 
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "deepcogito/cogito-v1-preview-llama-3B"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

DEEP_THINKING_INSTRUCTION = "Enable deep thinking subroutine."
prompt = "Write a bash script that takes a matrix represented as a string with format '[1,2],[3,4],[5,6]' and prints the transpose in the same format."

messages = [
    {"role": "system", "content": DEEP_THINKING_INSTRUCTION},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```

Similarly, if you have a system prompt, you can append the `DEEP_THINKING_INSTRUCTION` to the beginning in this way - 
```python
DEEP_THINKING_INSTRUCTION = "Enable deep thinking subroutine."

system_prompt = "Reply to each prompt with only code answers - no explanations."
prompt = "Write a bash script that takes a matrix represented as a string with format '[1,2],[3,4],[5,6]' and prints the transpose in the same format."


messages = [
    {"role": "system", "content": DEEP_THINKING_INSTRUCTION + '\n\n' + system_prompt},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
```

### Method 2 - Set enable_thinking=True in the tokenizer
If you are using Huggingface tokenizers, then you can simply use add the argument `enable_thinking=True` to the tokenization (this option is added to the chat template.)
Here is an example - 
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "deepcogito/cogito-v1-preview-llama-3B"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Write a bash script that takes a matrix represented as a string with format '[1,2],[3,4],[5,6]' and prints the transpose in the same format."

messages = [{"role": "user", "content": prompt}]

# Add enable_thinking=True for thinking mode.
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```