Header

DeepNeo: A hybrid model with precision and power

Overview

DeepNeo is a hybrid model that can be used like any other LLM, but DeepNeo has a mode that is inspired by NousResearch/DeepHermes-3-Llama-3-8B-Preview, which allows the model to activate a CoT-like response. This is done by toggling the system prompt. Unlike NousResearch/DeepHermes-3-Llama-3-8B-Preview, DeepNeo is slightly more flexible in its sizes. We have introduced an 8B and 12B model; both of them are based on Mistral AI's models

Model Details

DeepNeo 12B Key features

  • Developed by: Spestly (Open-Neo) & Kazex (Open-Neo)
  • Released under the Mistral Research License, reach out to Mistral AI for a commercial license
  • Trained with a 128k context window
  • Trained on a large proportion of multilingual and synthetic reasoning data
  • Supports function calling
Feature Value
Architecture Dense Transformer
Parameters ~12B
Layers 40
Heads 32
KV Heads (GQA) 8
Hidden Dim 14336
Head Dim 128
Vocab Size 131,072
Context Length 128k
Attention Pattern Ragged (128k,32k,32k,32k)

Usage

Intuitive mode

By default, this mode is activated, and you do not need to change anything. This means you are allowed to use any system prompt! We have given an example below.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("")
model = AutoModelForCausalLM.from_pretrained(
    "open-neo/DeepNeo-1-12B-Preview",
    torch_dtype=torch.float16,
    device_map="auto"
)

messages = [
    {"role": "user", "content": "What are the most interesting things to do in Paris?"}
]

input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=2500, temperature=0.8, do_sample=True)

print(f"Generated Tokens: {generated_ids.shape[-1]}")
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(f"Response: {response}")

Reasoning mode

To activate this mode, we need to do some extra steps. Almost all system instructions should work as long as they mention <Thought></Thought> and <Output></Output>. An example of this system prompt is given below. Please note that it may require tweaking for your specific use case.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("")
model = AutoModelForCausalLM.from_pretrained(
    "open-neo/DeepNeo-1-12B-Preview",
    torch_dtype=torch.float16,
    device_map="auto"
)

messages = [
    {"role": "system", "content": "You are a deep-thinking AI model. You must put your thoughts in the <Thought> tags and your output in the <Output> tags."},
    {"role": "user", "content": "What are the most interesting things to do in Paris?"}
]

input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=2500, temperature=0.8, do_sample=True)

print(f"Generated Tokens: {generated_ids.shape[-1]}")
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(f"Response: {response}")

Citations

@misc{deepneo-1,
      title={DeepNeo: A hybrid model with precision and power}, 
      author={Aayan Mishra and Krish Thumar},
      howpublished={https://huggingface.co/collections/open-neo/deepneo-1-67aea4c0f086ab0f70ed5720},
      year={2025}
}
Downloads last month
14
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for open-neo/DeepNeo-1-12B-Preview

Finetuned
(64)
this model
Quantizations
1 model