File size: 6,116 Bytes

---
license: apache-2.0
datasets:
- GetSoloTech/Code-Reasoning
language:
- en
base_model:
- GetSoloTech/Qwen3-Code-Reasoning-4B
pipeline_tag: text-generation
tags:
- coding
- reasoning
- problem-solving
- algorithms
- python
- c++
---

# GetSoloTech/Qwen3-Code-Reasoning-4B-GGUF

This is the GGUF quantized version of the [Qwen3-Code-Reasoning-4B](https://huggingface.co/GetSoloTech/Qwen3-Code-Reasoning-4B) model, specifically optimized for competitive programming and code reasoning tasks. This model has been trained on the high-quality Code-Reasoning dataset to enhance its capabilities in solving complex programming problems with detailed reasoning.


## 🚀 Key Features

* **Enhanced Code Reasoning**: Specifically trained on competitive programming problems
* **Thinking Capabilities**: Inherits the advanced reasoning capabilities from the base model
* **High-Quality Solutions**: Trained on solutions with ≥85% test case pass rates
* **Structured Output**: Optimized for generating well-reasoned programming solutions
* **Efficient Inference**: GGUF format enables fast inference on CPU and GPU
* **Multiple Quantization Levels**: Available in various precision levels for different hardware requirements

### Dataset Statistics

* **Split**: Python
* **Source**: High-quality competitive programming problems from TACO, APPS, CodeContests, and Codeforces
* **Quality Filter**: Only correctly solved problems with ≥85% test case pass rates

## 🔧 Usage

### Using with llama.cpp

```bash
# Download the model (choose your preferred quantization)
wget https://huggingface.co/GetSoloTech/Qwen3-Code-Reasoning-4B-GGUF/resolve/main/qwen3-code-reasoning-4b.Q4_K_M.gguf

# Run inference
./llama.cpp -m qwen3-code-reasoning-4b.Q4_K_M.gguf -n 4096 --repeat_penalty 1.1 -p "You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful.\n\nProblem: Your programming problem here..."
```

### Using with Python (llama-cpp-python)

```python
from llama_cpp import Llama

# Load the model
llm = Llama(
    model_path="./qwen3-code-reasoning-4b.Q4_K_M.gguf",
    n_ctx=4096,
    n_threads=4
)

# Prepare input for competitive programming problem
prompt = """You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful.

Problem: Your programming problem here..."""

# Generate solution
output = llm(
    prompt,
    max_tokens=4096,
    temperature=0.7,
    top_p=0.8,
    top_k=20,
    repeat_penalty=1.1
)

print(output['choices'][0]['text'])
```

### Using with Ollama

```bash
# Create a Modelfile
cat > Modelfile << EOF
FROM ./qwen3-code-reasoning-4b.Q4_K_M.gguf
TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
"""
PARAMETER temperature 0.7
PARAMETER top_p 0.8
PARAMETER top_k 20
PARAMETER repeat_penalty 1.1
EOF

# Create and run the model
ollama create qwen3-code-reasoning -f Modelfile
ollama run qwen3-code-reasoning "Solve this competitive programming problem: [your problem here]"
```

## 📊 Available Quantizations

| Quantization | Size | Memory Usage | Quality | Use Case |
|--------------|------|--------------|---------|----------|
| Q3_K_M | 2.08 GB | ~3 GB | Good | CPU inference, limited memory |
| Q4_K_M | 2.5 GB | ~4 GB | Better | Balanced performance/memory |
| Q5_K_M | 2.89 GB | ~5 GB | Very Good | High quality, moderate memory |
| Q6_K | 3.31 GB | ~6 GB | Excellent | High quality, more memory |
| Q8_0 | 4.28 GB | ~8 GB | Best | Maximum quality, high memory |
| F16 | 8.05 GB | ~16 GB | Original | Maximum quality, GPU recommended |

## 📈 Performance Expectations

This GGUF quantized model maintains the performance characteristics of the original finetuned model:

* **Competitive Programming Problems**: Better understanding of problem constraints and requirements
* **Code Generation**: More accurate and efficient solutions
* **Reasoning Quality**: Enhanced step-by-step reasoning for complex problems
* **Solution Completeness**: More comprehensive solutions with proper edge case handling

## 🎛️ Recommended Settings

### For Code Generation

* **Temperature**: 0.7
* **Top-p**: 0.8
* **Top-k**: 20
* **Max New Tokens**: 4096 (adjust based on problem complexity)
* **Repeat Penalty**: 1.1

### For Reasoning Tasks

* **Temperature**: 0.6
* **Top-p**: 0.95
* **Top-k**: 20
* **Max New Tokens**: 8192 (for complex reasoning)
* **Repeat Penalty**: 1.1

## 🛠️ Hardware Requirements

### Minimum Requirements
* **RAM**: 4 GB (for Q3_K_M quantization)
* **Storage**: 2.5 GB free space
* **CPU**: Multi-core processor recommended

### Recommended Requirements
* **RAM**: 8 GB or more
* **Storage**: 5 GB free space
* **GPU**: NVIDIA GPU with 4GB+ VRAM (optional, for faster inference)

## 🤝 Contributing

This GGUF model was converted from the original LoRA-finetuned model. For questions about:

* The original model: [GetSoloTech/Qwen3-Code-Reasoning-4B](https://huggingface.co/GetSoloTech/Qwen3-Code-Reasoning-4B)
* The base model: [Qwen3 GitHub](https://github.com/QwenLM/Qwen3)
* The training dataset: [Code-Reasoning Repository](https://huggingface.co/datasets/GetSoloTech/Code-Reasoning)
* The training framework: [Unsloth Documentation](https://github.com/unslothai/unsloth)

## 📄 License

This model follows the same license as the base model (Apache 2.0). Please refer to the base model license for details.

## 🙏 Acknowledgments

* **Qwen Team** for the excellent base model
* **Unsloth Team** for the efficient training framework
* **NVIDIA Research** for the original OpenCodeReasoning-2 dataset
* **llama.cpp community** for the GGUF format and tools

## 📞 Contact

For questions about this GGUF model, please open an issue in the repository.

---

**Note**: This model is specifically optimized for competitive programming and code reasoning tasks. The GGUF format enables efficient inference on various hardware configurations while maintaining the model's reasoning capabilities.