DeepNeo: A hybrid model with precision and power
Overview
DeepNeo is a hybrid model that can be used like any other LLM, but DeepNeo has a mode that is inspired by NousResearch/DeepHermes-3-Llama-3-8B-Preview, which allows the model to activate a CoT-like response. This is done by toggling the system prompt. Unlike NousResearch/DeepHermes-3-Llama-3-8B-Preview, DeepNeo is slightly more flexible in its sizes. We have introduced an 8B and 12B model; both of them are based on Mistral AI's models
Model Details
DeepNeo 12B Key features
- Developed by: Spestly (Open-Neo) & Kazex (Open-Neo)
- Released under the Mistral Research License, reach out to Mistral AI for a commercial license
- Trained with a 128k context window
- Trained on a large proportion of multilingual and synthetic reasoning data
- Supports function calling
Feature | Value |
---|---|
Architecture | Dense Transformer |
Parameters | ~12B |
Layers | 40 |
Heads | 32 |
KV Heads (GQA) | 8 |
Hidden Dim | 14336 |
Head Dim | 128 |
Vocab Size | 131,072 |
Context Length | 128k |
Attention Pattern | Ragged (128k,32k,32k,32k) |
Usage
Intuitive mode
By default, this mode is activated, and you do not need to change anything. This means you are allowed to use any system prompt! We have given an example below.
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("")
model = AutoModelForCausalLM.from_pretrained(
"open-neo/DeepNeo-1-12B-Preview",
torch_dtype=torch.float16,
device_map="auto"
)
messages = [
{"role": "user", "content": "What are the most interesting things to do in Paris?"}
]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=2500, temperature=0.8, do_sample=True)
print(f"Generated Tokens: {generated_ids.shape[-1]}")
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(f"Response: {response}")
Reasoning mode
To activate this mode, we need to do some extra steps. Almost all system instructions should work as long as they mention <Thought></Thought>
and <Output></Output>
. An example of this system prompt is given below. Please note that it may require tweaking for your specific use case.
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("")
model = AutoModelForCausalLM.from_pretrained(
"open-neo/DeepNeo-1-12B-Preview",
torch_dtype=torch.float16,
device_map="auto"
)
messages = [
{"role": "system", "content": "You are a deep-thinking AI model. You must put your thoughts in the <Thought> tags and your output in the <Output> tags."},
{"role": "user", "content": "What are the most interesting things to do in Paris?"}
]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=2500, temperature=0.8, do_sample=True)
print(f"Generated Tokens: {generated_ids.shape[-1]}")
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(f"Response: {response}")
Citations
@misc{deepneo-1,
title={DeepNeo: A hybrid model with precision and power},
author={Aayan Mishra and Krish Thumar},
howpublished={https://huggingface.co/collections/open-neo/deepneo-1-67aea4c0f086ab0f70ed5720},
year={2025}
}
- Downloads last month
- 14