deepcogito
/

cogito-v1-preview-llama-3B

 ---
+license: llama3.2
 library_name: transformers
+base_model:
+- meta-llama/Llama-3.2-3B
+pipeline_tag: text-generation
+---
+# Cogito v1 preview - 3B
+## NOTE
+- The model weights may be updated by Sunday, April 7th. However, these weights will just be a later checkpoint of the model currently being trained.
+- The base model (and therefore the model architecture) will remain the same. Similarly, the tokenizer will remain unchanged, as well as how to enable reasoning.
+- The complete description will be uploaded along with the evals results in the next few days.
+## Introduction
+The Cogito LLMs are instruction tuned generative models (text in/text out). All models are released under an open license for commercial use.
+- Cogito models are hybrid reasoning models. You can pick when you want the model to answer normally and when you want it to think longer before answering.
+- They have significantly higher multilingual, coding and tool calling capabilities than their counterparts, and have been optimized for coding, STEM, instruction following, and general helpfulness.
+- Early testing demonstrates that Cogito v1-preview models significantly outperform their size equivalent counterparts on common industry benchmarks in the standard mode.
+- Similarly, in the reasoning mode, Cogito v1-preview models outperform their size equivalent reasoning model counterparts on common industry benchmarks.
+## Implementing extended thinking
+This section will walk through how to use Cogito models to enable extended thinking (i.e., reasoning mode).
+- By default, the model will answer in the standard mode.
+- To enable thinking, you can do any one of the two methods:
+  - Add a specific system prompt, or
+  - Set `enable_thinking=True` during tokenization.
+### Method 1 - Add a specific system prompt.
+To enable thinking, simply use this in the system prompt `system_instruction = 'Enable deep thinking subroutine.'`
+If you already have a system_instruction, then use `system_instruction = 'Enable deep thinking subroutine.' + '\n\n' + system_instruction`.
+Here is an example -
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "deepcogito/cogito-v1-preview-llama-3B"
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+DEEP_THINKING_INSTRUCTION = "Enable deep thinking subroutine."
+prompt = "Write a bash script that takes a matrix represented as a string with format '[1,2],[3,4],[5,6]' and prints the transpose in the same format."
+messages = [
+    {"role": "system", "content": DEEP_THINKING_INSTRUCTION},
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+generated_ids = model.generate(
+    **model_inputs,
+    max_new_tokens=512
+)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+```
+Similarly, if you have a system prompt, you can append the `DEEP_THINKING_INSTRUCTION` to the beginning in this way -
+```python
+DEEP_THINKING_INSTRUCTION = "Enable deep thinking subroutine."
+system_prompt = "Reply to each prompt with only code answers - no explanations."
+prompt = "Write a bash script that takes a matrix represented as a string with format '[1,2],[3,4],[5,6]' and prints the transpose in the same format."
+messages = [
+    {"role": "system", "content": DEEP_THINKING_INSTRUCTION + '\n\n' + system_prompt},
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+```
+### Method 2 - Set enable_thinking=True in the tokenizer
+If you are using Huggingface tokenizers, then you can simply use add the argument `enable_thinking=True` to the tokenization (this option is added to the chat template.)
+Here is an example -
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "deepcogito/cogito-v1-preview-llama-3B"
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+prompt = "Write a bash script that takes a matrix represented as a string with format '[1,2],[3,4],[5,6]' and prints the transpose in the same format."
+messages = [{"role": "user", "content": prompt}]
+# Add enable_thinking=True for thinking mode.
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True,
+    enable_thinking=True
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+generated_ids = model.generate(
+    **model_inputs,
+    max_new_tokens=512
+)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+```