DeepSeek R1 Code Reasoning 8B

Model Description

This model is a fine-tuned version of unsloth/DeepSeek-R1-Distill-Llama-8B specialized for advanced code reasoning tasks. It has been trained on challenging programming problems from the nvidia/OpenCodeReasoning dataset, specifically focusing on problems with "VERY_HARD" difficulty levels (10 and 11).

Model Details

Base Model: DeepSeek-R1-Distill-Llama-8B
Model Type: Causal Language Model (Fine-tuned)
Architecture: LLaMA-based transformer
Parameters: ~8 billion
Training Data: Filtered nvidia/OpenCodeReasoning dataset (VERY_HARD difficulty problems)
Fine-tuning Method: LoRA (Low-Rank Adaptation)
License: Apache 2.0

Training Details

Training Framework: Unsloth + Transformers
Fine-tuning Method: LoRA with rank 16
Batch Size: 2 per device with 4 gradient accumulation steps
Learning Rate: 2e-4
Optimizer: AdamW 8-bit
Precision: Mixed precision (FP16/BF16)
Max Sequence Length: 2048 tokens

Dataset

The model was trained on a carefully filtered subset of the nvidia/OpenCodeReasoning dataset:

Source: nvidia/OpenCodeReasoning (split_0)
Filter Criteria: Only problems with difficulty "VERY_HARD", 10, or 11
Columns Used: input (problem), output (expected result), solution (reasoning)

Intended Use

This model is designed for:

Advanced algorithmic problem solving
Code generation with detailed reasoning
Educational purposes for understanding complex programming concepts
Research in automated code reasoning

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Soumyajit-7/code-reasoning-deepseek-8b")
model = AutoModelForCausalLM.from_pretrained(
    "Soumyajit-7/code-reasoning-deepseek-8b",
    torch_dtype=torch.float16,
    device_map="auto"
)

# Define the prompt template
prompt_template = """Below is an instruction that describes a coding task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the problem and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a coding expert with advanced knowledge in programming, algorithms, and problem-solving. 
Please solve the following coding problem with detailed reasoning.

### Problem:
{problem}

### Response:
<think>"""

# Example usage
problem = """
Problem description.
Vipul is a hardworking super-hero who maintains the bracket ratio of all the strings in the world. Recently he indulged himself in saving the string population so much that he lost his ability for checking brackets (luckily, not permanently ).Being his super-hero friend help him in his time of hardship.
Input

The first line of the input contains an integer T denoting the number of test cases. The description of T test cases follows.
The first line of each test case contains a single string S denoting the string to be checked.


Output

For each test case, output a single line printing "YES" or "NO" (without " " and in uppercase only) , denoting if the brackets in the given string is balanced or not .


Constraints

1 ≤ T ≤ 10
1 ≤ length of S ≤ 60


Example
Input:
3
((()))
(())()
()(()

Output:
YES
YES
NO

 

Explanation
Example is self-explanatory.
"""
prompt = prompt_template.format(problem=problem)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=1200,
    temperature=0.7,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response.split("### Response:")[1])

Model Capabilities

The model excels at:

Algorithm Design: Creating efficient algorithms for complex problems
Code Optimization: Improving time and space complexity
Problem Analysis: Breaking down complex problems into manageable steps
Mathematical Reasoning: Solving problems requiring mathematical insights
Data Structure Implementation: Designing and implementing advanced data structures

Prompt Format

The model expects prompts in the following format:

### Instruction:
You are a coding expert with advanced knowledge in programming, algorithms, and problem-solving. 
Please solve the following coding problem with detailed reasoning.

### Problem:
[Your coding problem here]

### Response:
<think>
[The model will provide step-by-step reasoning here]
</think>
[Final solution/answer here]

Performance

This model has been specifically trained on the most challenging programming problems and shows improved performance on:

Advanced algorithmic challenges
Complex data structure problems
Mathematical programming tasks
Optimization problems

Limitations

The model is specialized for code reasoning and may not perform as well on general conversation
Training was focused on very hard problems, so it might be over-engineered for simple tasks
Like all language models, it may occasionally generate incorrect or suboptimal solutions
The model should be used as a coding assistant, not a replacement for human review

Training Infrastructure

GPU: NVIDIA A100/V100 (recommended)
Memory: 16GB+ GPU memory required
Framework: Unsloth for efficient training
Quantization: Trained with 4-bit quantization for memory efficiency

Ethical Considerations

This model is designed for educational and research purposes. Users should:

Verify generated code before using in production
Understand the logic behind solutions rather than blindly copying
Use responsibly for learning and problem-solving enhancement

Future Work

Potential improvements:

Training on additional challenging datasets
Multi-language code generation support
Integration with code execution environments
Fine-tuning on specific programming domains

Citation

If you use this model in your research, please cite:

@misc{code-reasoning-deepseek-8b,
  title={DeepSeek R1 Code Reasoning 8B},
  author={Soumyajit},
  year={2025},
  howpublished={\url{https://huggingface.co/Soumyajit-7/code-reasoning-deepseek-8b}},
}

Acknowledgments

Based on DeepSeek-R1-Distill-Llama-8B
Trained using Unsloth for efficient fine-tuning
Dataset from NVIDIA's OpenCodeReasoning project
Special thanks to the open-source community for making this possible

Model trained and maintained by Soumyajit-7. For questions or issues, please open an issue in the repository.