DeepSeek R1 Code Reasoning 8B

Model Description

This model is a fine-tuned version of unsloth/DeepSeek-R1-Distill-Llama-8B specialized for advanced code reasoning tasks. It has been trained on challenging programming problems from the nvidia/OpenCodeReasoning dataset, specifically focusing on problems with "VERY_HARD" difficulty levels (10 and 11).

Model Details

  • Base Model: DeepSeek-R1-Distill-Llama-8B
  • Model Type: Causal Language Model (Fine-tuned)
  • Architecture: LLaMA-based transformer
  • Parameters: ~8 billion
  • Training Data: Filtered nvidia/OpenCodeReasoning dataset (VERY_HARD difficulty problems)
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • License: Apache 2.0

Training Details

  • Training Framework: Unsloth + Transformers
  • Fine-tuning Method: LoRA with rank 16
  • Batch Size: 2 per device with 4 gradient accumulation steps
  • Learning Rate: 2e-4
  • Optimizer: AdamW 8-bit
  • Precision: Mixed precision (FP16/BF16)
  • Max Sequence Length: 2048 tokens

Dataset

The model was trained on a carefully filtered subset of the nvidia/OpenCodeReasoning dataset:

  • Source: nvidia/OpenCodeReasoning (split_0)
  • Filter Criteria: Only problems with difficulty "VERY_HARD", 10, or 11
  • Columns Used: input (problem), output (expected result), solution (reasoning)

Intended Use

This model is designed for:

  • Advanced algorithmic problem solving
  • Code generation with detailed reasoning
  • Educational purposes for understanding complex programming concepts
  • Research in automated code reasoning

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Soumyajit-7/code-reasoning-deepseek-8b")
model = AutoModelForCausalLM.from_pretrained(
    "Soumyajit-7/code-reasoning-deepseek-8b",
    torch_dtype=torch.float16,
    device_map="auto"
)

# Define the prompt template
prompt_template = """Below is an instruction that describes a coding task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the problem and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a coding expert with advanced knowledge in programming, algorithms, and problem-solving. 
Please solve the following coding problem with detailed reasoning.

### Problem:
{problem}

### Response:
<think>"""

# Example usage
problem = """
Problem description.
Vipul is a hardworking super-hero who maintains the bracket ratio of all the strings in the world. Recently he indulged himself in saving the string population so much that he lost his ability for checking brackets (luckily, not permanently ).Being his super-hero friend help him in his time of hardship.
Input

The first line of the input contains an integer T denoting the number of test cases. The description of T test cases follows.
The first line of each test case contains a single string S denoting the string to be checked.


Output

For each test case, output a single line printing "YES" or "NO" (without " " and in uppercase only) , denoting if the brackets in the given string is balanced or not .


Constraints

1 ≤ T ≤ 10
1 ≤ length of S ≤ 60


Example
Input:
3
((()))
(())()
()(()

Output:
YES
YES
NO

 

Explanation
Example is self-explanatory.
"""
prompt = prompt_template.format(problem=problem)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=1200,
    temperature=0.7,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response.split("### Response:")[1])

Model Capabilities

The model excels at:

  • Algorithm Design: Creating efficient algorithms for complex problems
  • Code Optimization: Improving time and space complexity
  • Problem Analysis: Breaking down complex problems into manageable steps
  • Mathematical Reasoning: Solving problems requiring mathematical insights
  • Data Structure Implementation: Designing and implementing advanced data structures

Prompt Format

The model expects prompts in the following format:

### Instruction:
You are a coding expert with advanced knowledge in programming, algorithms, and problem-solving. 
Please solve the following coding problem with detailed reasoning.

### Problem:
[Your coding problem here]

### Response:
<think>
[The model will provide step-by-step reasoning here]
</think>
[Final solution/answer here]

Performance

This model has been specifically trained on the most challenging programming problems and shows improved performance on:

  • Advanced algorithmic challenges
  • Complex data structure problems
  • Mathematical programming tasks
  • Optimization problems

Limitations

  • The model is specialized for code reasoning and may not perform as well on general conversation
  • Training was focused on very hard problems, so it might be over-engineered for simple tasks
  • Like all language models, it may occasionally generate incorrect or suboptimal solutions
  • The model should be used as a coding assistant, not a replacement for human review

Training Infrastructure

  • GPU: NVIDIA A100/V100 (recommended)
  • Memory: 16GB+ GPU memory required
  • Framework: Unsloth for efficient training
  • Quantization: Trained with 4-bit quantization for memory efficiency

Ethical Considerations

This model is designed for educational and research purposes. Users should:

  • Verify generated code before using in production
  • Understand the logic behind solutions rather than blindly copying
  • Use responsibly for learning and problem-solving enhancement

Future Work

Potential improvements:

  • Training on additional challenging datasets
  • Multi-language code generation support
  • Integration with code execution environments
  • Fine-tuning on specific programming domains

Citation

If you use this model in your research, please cite:

@misc{code-reasoning-deepseek-8b,
  title={DeepSeek R1 Code Reasoning 8B},
  author={Soumyajit},
  year={2025},
  howpublished={\url{https://huggingface.co/Soumyajit-7/code-reasoning-deepseek-8b}},
}

Acknowledgments

  • Based on DeepSeek-R1-Distill-Llama-8B
  • Trained using Unsloth for efficient fine-tuning
  • Dataset from NVIDIA's OpenCodeReasoning project
  • Special thanks to the open-source community for making this possible

Model trained and maintained by Soumyajit-7. For questions or issues, please open an issue in the repository.

Downloads last month
8
Safetensors
Model size
4.74B params
Tensor type
F16
·
F32
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support