AdvRahul/Axion-Flash-Reasoning-2B
An optimized and instruction-tuned model for high-speed, complex reasoning tasks. 🚀
Axion-Flash-Reasoning-2B
is a fine-tuned version of NVIDIA's state-of-the-art Nemotron-Research-Reasoning-Qwen-1.5B
model. This version is specifically adapted to be more instruction-friendly and computationally efficient, making it ideal for integration into applications requiring powerful reasoning capabilities without the overhead of larger models.
🚀 Model Details
- Model Creator: AdvRahul
- Base Model: nvidia/Nemotron-Research-Reasoning-Qwen-1.5B (v2 checkpoint)
- Fine-tuning Focus: Enhanced Instruction Following & Practical Usability
- Architecture: Qwen 1.5
- License: Creative Commons Attribution-NonCommercial 4.0 International (
cc-by-nc-4.0
)
💻 How to Use
This model can be used with the transformers
library.
Basic Inference with pipeline
The easiest way to get started is with the text-generation
pipeline.
from transformers import pipeline
import torch
# For optimal performance, use a GPU
pipe = pipeline(
"text-generation",
model="AdvRahul/Axion-Flash-Reasoning-2B",
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Qwen models use a specific chat template. The pipeline handles this automatically.
messages = [
{"role": "system", "content": "You are a helpful assistant that excels at logical reasoning."},
{"role": "user", "content": "I have 3 apples and I buy 5 more. I then give 2 apples to my friend. How many apples do I have left?"}
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
Optimized Inference (4-bit Quantization)
To achieve "flash" speed and reduce memory usage, you can load the model in 4-bit using bitsandbytes.
Bash
pip install transformers torch accelerate bitsandbytes
Python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "AdvRahul/Axion-Flash-Reasoning-2B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
# This enables 4-bit quantization
load_in_4bit=True
)
messages = [
{"role": "system", "content": "You are an expert code assistant."},
{"role": "user", "content": "Write a Python function to calculate the factorial of a number using recursion."}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=150)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
📝 Model Description
Fine-Tuning Philosophy
While the base Nemotron-Research-Reasoning model demonstrates world-class capabilities in formal reasoning (math, code, logic), Axion-Flash has been further instruction-tuned to make these powerful abilities more accessible and practical for real-world applications. The goal is to bridge the gap between a pure research model and a deployable, instruction-following assistant that developers can easily integrate into their products.
This fine-tuning enhances the model's ability to understand and follow user instructions in a conversational format, unlocking its reasoning power for a broader range of tasks.
Key Capabilities
Complex Reasoning: Inherits the base model's strength in solving logic puzzles, scientific questions, and multi-step problems.
Code Generation: Proficient in generating code for various programming challenges and tasks.
Mathematical Prowess: Excels at solving mathematical problems, from basic arithmetic to more complex Olympiad-level questions.
Enhanced Instruction Following: Fine-tuned to better adhere to user instructions and constraints in a chat-like setting.
ℹ️ Base Model Information (Nemotron-Research-Reasoning-Qwen-1.5B)
<details>
<summary>Click to expand details on the powerful base model</summary>
Nemotron-Research-Reasoning-Qwen-1.5B is a leading open-weight model for complex reasoning, trained by NVIDIA using the ProRL (Prolonged Reinforcement Learning) algorithm. This advanced training method enables the model to explore reasoning strategies more deeply, leading to significant performance gains.
The base model was trained on a diverse set of datasets, including:
DeepScaleR-Preview-Dataset
Eurus-2-RL-Data
Reasoning-gym
IFEval
SCP-116K
It sets a new state-of-the-art standard for models in its size class, outperforming competitors by a large margin on benchmarks for math, coding, logic puzzles, and STEM reasoning. For detailed performance metrics, please refer to the original model card.
</details>
⚖️ License and Terms of Use
This model is released under the cc-by-nc-4.0 license, inheriting the license of its base model.
This means it is available for research and non-commercial use only. Please review the license terms before using this model in your projects.
- Downloads last month
- 21
Hardware compatibility
Log In
to view the estimation
8-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for AdvRahul/Axion-Flash-Reasoning-2B
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B