DeepSeek R1 Code Reasoning 8B
Model Description
This model is a fine-tuned version of unsloth/DeepSeek-R1-Distill-Llama-8B specialized for advanced code reasoning tasks. It has been trained on challenging programming problems from the nvidia/OpenCodeReasoning dataset, specifically focusing on problems with "VERY_HARD" difficulty levels (10 and 11).
Model Details
- Base Model: DeepSeek-R1-Distill-Llama-8B
- Model Type: Causal Language Model (Fine-tuned)
- Architecture: LLaMA-based transformer
- Parameters: ~8 billion
- Training Data: Filtered nvidia/OpenCodeReasoning dataset (VERY_HARD difficulty problems)
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- License: Apache 2.0
Training Details
- Training Framework: Unsloth + Transformers
- Fine-tuning Method: LoRA with rank 16
- Batch Size: 2 per device with 4 gradient accumulation steps
- Learning Rate: 2e-4
- Optimizer: AdamW 8-bit
- Precision: Mixed precision (FP16/BF16)
- Max Sequence Length: 2048 tokens
Dataset
The model was trained on a carefully filtered subset of the nvidia/OpenCodeReasoning dataset:
- Source: nvidia/OpenCodeReasoning (split_0)
- Filter Criteria: Only problems with difficulty "VERY_HARD", 10, or 11
- Columns Used: input (problem), output (expected result), solution (reasoning)
Intended Use
This model is designed for:
- Advanced algorithmic problem solving
- Code generation with detailed reasoning
- Educational purposes for understanding complex programming concepts
- Research in automated code reasoning
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Soumyajit-7/code-reasoning-deepseek-8b")
model = AutoModelForCausalLM.from_pretrained(
"Soumyajit-7/code-reasoning-deepseek-8b",
torch_dtype=torch.float16,
device_map="auto"
)
# Define the prompt template
prompt_template = """Below is an instruction that describes a coding task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the problem and create a step-by-step chain of thoughts to ensure a logical and accurate response.
### Instruction:
You are a coding expert with advanced knowledge in programming, algorithms, and problem-solving.
Please solve the following coding problem with detailed reasoning.
### Problem:
{problem}
### Response:
<think>"""
# Example usage
problem = """
Problem description.
Vipul is a hardworking super-hero who maintains the bracket ratio of all the strings in the world. Recently he indulged himself in saving the string population so much that he lost his ability for checking brackets (luckily, not permanently ).Being his super-hero friend help him in his time of hardship.
Input
The first line of the input contains an integer T denoting the number of test cases. The description of T test cases follows.
The first line of each test case contains a single string S denoting the string to be checked.
Output
For each test case, output a single line printing "YES" or "NO" (without " " and in uppercase only) , denoting if the brackets in the given string is balanced or not .
Constraints
1 ≤ T ≤ 10
1 ≤ length of S ≤ 60
Example
Input:
3
((()))
(())()
()(()
Output:
YES
YES
NO
Explanation
Example is self-explanatory.
"""
prompt = prompt_template.format(problem=problem)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=1200,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response.split("### Response:")[1])
Model Capabilities
The model excels at:
- Algorithm Design: Creating efficient algorithms for complex problems
- Code Optimization: Improving time and space complexity
- Problem Analysis: Breaking down complex problems into manageable steps
- Mathematical Reasoning: Solving problems requiring mathematical insights
- Data Structure Implementation: Designing and implementing advanced data structures
Prompt Format
The model expects prompts in the following format:
### Instruction:
You are a coding expert with advanced knowledge in programming, algorithms, and problem-solving.
Please solve the following coding problem with detailed reasoning.
### Problem:
[Your coding problem here]
### Response:
<think>
[The model will provide step-by-step reasoning here]
</think>
[Final solution/answer here]
Performance
This model has been specifically trained on the most challenging programming problems and shows improved performance on:
- Advanced algorithmic challenges
- Complex data structure problems
- Mathematical programming tasks
- Optimization problems
Limitations
- The model is specialized for code reasoning and may not perform as well on general conversation
- Training was focused on very hard problems, so it might be over-engineered for simple tasks
- Like all language models, it may occasionally generate incorrect or suboptimal solutions
- The model should be used as a coding assistant, not a replacement for human review
Training Infrastructure
- GPU: NVIDIA A100/V100 (recommended)
- Memory: 16GB+ GPU memory required
- Framework: Unsloth for efficient training
- Quantization: Trained with 4-bit quantization for memory efficiency
Ethical Considerations
This model is designed for educational and research purposes. Users should:
- Verify generated code before using in production
- Understand the logic behind solutions rather than blindly copying
- Use responsibly for learning and problem-solving enhancement
Future Work
Potential improvements:
- Training on additional challenging datasets
- Multi-language code generation support
- Integration with code execution environments
- Fine-tuning on specific programming domains
Citation
If you use this model in your research, please cite:
@misc{code-reasoning-deepseek-8b,
title={DeepSeek R1 Code Reasoning 8B},
author={Soumyajit},
year={2025},
howpublished={\url{https://huggingface.co/Soumyajit-7/code-reasoning-deepseek-8b}},
}
Acknowledgments
- Based on DeepSeek-R1-Distill-Llama-8B
- Trained using Unsloth for efficient fine-tuning
- Dataset from NVIDIA's OpenCodeReasoning project
- Special thanks to the open-source community for making this possible
Model trained and maintained by Soumyajit-7. For questions or issues, please open an issue in the repository.
- Downloads last month
- 8