--- license: llama3.2 library_name: transformers base_model: - meta-llama/Llama-3.2-3B pipeline_tag: text-generation --- # Cogito v1 preview - 3B ## NOTE - The model weights may be updated by Sunday, April 7th. However, these weights will just be a later checkpoint of the model currently being trained. - The base model (and therefore the model architecture) will remain the same. Similarly, the tokenizer will remain unchanged, as well as how to enable reasoning. - The complete description will be uploaded along with the evals results in the next few days. ## Introduction The Cogito LLMs are instruction tuned generative models (text in/text out). All models are released under an open license for commercial use. - Cogito models are hybrid reasoning models. You can pick when you want the model to answer normally and when you want it to think longer before answering. - They have significantly higher multilingual, coding and tool calling capabilities than their counterparts, and have been optimized for coding, STEM, instruction following, and general helpfulness. - Early testing demonstrates that Cogito v1-preview models significantly outperform their size equivalent counterparts on common industry benchmarks in the standard mode. - Similarly, in the reasoning mode, Cogito v1-preview models outperform their size equivalent reasoning model counterparts on common industry benchmarks. ## Implementing extended thinking This section will walk through how to use Cogito models to enable extended thinking (i.e., reasoning mode). - By default, the model will answer in the standard mode. - To enable thinking, you can do any one of the two methods: - Add a specific system prompt, or - Set `enable_thinking=True` during tokenization. ### Method 1 - Add a specific system prompt. To enable thinking, simply use this in the system prompt `system_instruction = 'Enable deep thinking subroutine.'` If you already have a system_instruction, then use `system_instruction = 'Enable deep thinking subroutine.' + '\n\n' + system_instruction`. Here is an example - ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "deepcogito/cogito-v1-preview-llama-3B" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) DEEP_THINKING_INSTRUCTION = "Enable deep thinking subroutine." prompt = "Write a bash script that takes a matrix represented as a string with format '[1,2],[3,4],[5,6]' and prints the transpose in the same format." messages = [ {"role": "system", "content": DEEP_THINKING_INSTRUCTION}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate( **model_inputs, max_new_tokens=512 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] ``` Similarly, if you have a system prompt, you can append the `DEEP_THINKING_INSTRUCTION` to the beginning in this way - ```python DEEP_THINKING_INSTRUCTION = "Enable deep thinking subroutine." system_prompt = "Reply to each prompt with only code answers - no explanations." prompt = "Write a bash script that takes a matrix represented as a string with format '[1,2],[3,4],[5,6]' and prints the transpose in the same format." messages = [ {"role": "system", "content": DEEP_THINKING_INSTRUCTION + '\n\n' + system_prompt}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) ``` ### Method 2 - Set enable_thinking=True in the tokenizer If you are using Huggingface tokenizers, then you can simply use add the argument `enable_thinking=True` to the tokenization (this option is added to the chat template.) Here is an example - ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "deepcogito/cogito-v1-preview-llama-3B" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) prompt = "Write a bash script that takes a matrix represented as a string with format '[1,2],[3,4],[5,6]' and prints the transpose in the same format." messages = [{"role": "user", "content": prompt}] # Add enable_thinking=True for thinking mode. text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, enable_thinking=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate( **model_inputs, max_new_tokens=512 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] ```