GetSoloTech/Qwen3-Code-Reasoning-4B-GGUF

This is the GGUF quantized version of the Qwen3-Code-Reasoning-4B model, specifically optimized for competitive programming and code reasoning tasks. This model has been trained on the high-quality Code-Reasoning dataset to enhance its capabilities in solving complex programming problems with detailed reasoning.

πŸš€ Key Features

  • Enhanced Code Reasoning: Specifically trained on competitive programming problems
  • Thinking Capabilities: Inherits the advanced reasoning capabilities from the base model
  • High-Quality Solutions: Trained on solutions with β‰₯85% test case pass rates
  • Structured Output: Optimized for generating well-reasoned programming solutions
  • Efficient Inference: GGUF format enables fast inference on CPU and GPU
  • Multiple Quantization Levels: Available in various precision levels for different hardware requirements

Dataset Statistics

  • Split: Python
  • Source: High-quality competitive programming problems from TACO, APPS, CodeContests, and Codeforces
  • Quality Filter: Only correctly solved problems with β‰₯85% test case pass rates

πŸ”§ Usage

Using with llama.cpp

# Download the model (choose your preferred quantization)
wget https://huggingface.co/GetSoloTech/Qwen3-Code-Reasoning-4B-GGUF/resolve/main/qwen3-code-reasoning-4b.Q4_K_M.gguf

# Run inference
./llama.cpp -m qwen3-code-reasoning-4b.Q4_K_M.gguf -n 4096 --repeat_penalty 1.1 -p "You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful.\n\nProblem: Your programming problem here..."

Using with Python (llama-cpp-python)

from llama_cpp import Llama

# Load the model
llm = Llama(
    model_path="./qwen3-code-reasoning-4b.Q4_K_M.gguf",
    n_ctx=4096,
    n_threads=4
)

# Prepare input for competitive programming problem
prompt = """You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful.

Problem: Your programming problem here..."""

# Generate solution
output = llm(
    prompt,
    max_tokens=4096,
    temperature=0.7,
    top_p=0.8,
    top_k=20,
    repeat_penalty=1.1
)

print(output['choices'][0]['text'])

Using with Ollama

# Create a Modelfile
cat > Modelfile << EOF
FROM ./qwen3-code-reasoning-4b.Q4_K_M.gguf
TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
"""
PARAMETER temperature 0.7
PARAMETER top_p 0.8
PARAMETER top_k 20
PARAMETER repeat_penalty 1.1
EOF

# Create and run the model
ollama create qwen3-code-reasoning -f Modelfile
ollama run qwen3-code-reasoning "Solve this competitive programming problem: [your problem here]"

πŸ“Š Available Quantizations

Quantization Size Memory Usage Quality Use Case
Q3_K_M 2.08 GB ~3 GB Good CPU inference, limited memory
Q4_K_M 2.5 GB ~4 GB Better Balanced performance/memory
Q5_K_M 2.89 GB ~5 GB Very Good High quality, moderate memory
Q6_K 3.31 GB ~6 GB Excellent High quality, more memory
Q8_0 4.28 GB ~8 GB Best Maximum quality, high memory
F16 8.05 GB ~16 GB Original Maximum quality, GPU recommended

πŸ“ˆ Performance Expectations

This GGUF quantized model maintains the performance characteristics of the original finetuned model:

  • Competitive Programming Problems: Better understanding of problem constraints and requirements
  • Code Generation: More accurate and efficient solutions
  • Reasoning Quality: Enhanced step-by-step reasoning for complex problems
  • Solution Completeness: More comprehensive solutions with proper edge case handling

πŸŽ›οΈ Recommended Settings

For Code Generation

  • Temperature: 0.7
  • Top-p: 0.8
  • Top-k: 20
  • Max New Tokens: 4096 (adjust based on problem complexity)
  • Repeat Penalty: 1.1

For Reasoning Tasks

  • Temperature: 0.6
  • Top-p: 0.95
  • Top-k: 20
  • Max New Tokens: 8192 (for complex reasoning)
  • Repeat Penalty: 1.1

πŸ› οΈ Hardware Requirements

Minimum Requirements

  • RAM: 4 GB (for Q3_K_M quantization)
  • Storage: 2.5 GB free space
  • CPU: Multi-core processor recommended

Recommended Requirements

  • RAM: 8 GB or more
  • Storage: 5 GB free space
  • GPU: NVIDIA GPU with 4GB+ VRAM (optional, for faster inference)

🀝 Contributing

This GGUF model was converted from the original LoRA-finetuned model. For questions about:

πŸ“„ License

This model follows the same license as the base model (Apache 2.0). Please refer to the base model license for details.

πŸ™ Acknowledgments

  • Qwen Team for the excellent base model
  • Unsloth Team for the efficient training framework
  • NVIDIA Research for the original OpenCodeReasoning-2 dataset
  • llama.cpp community for the GGUF format and tools

πŸ“ž Contact

For questions about this GGUF model, please open an issue in the repository.


Note: This model is specifically optimized for competitive programming and code reasoning tasks. The GGUF format enables efficient inference on various hardware configurations while maintaining the model's reasoning capabilities.

Downloads last month
697
GGUF
Model size
4.02B params
Architecture
qwen3
Hardware compatibility
Log In to view the estimation

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for GetSoloTech/Qwen3-Code-Reasoning-4B-GGUF

Quantized
(4)
this model

Dataset used to train GetSoloTech/Qwen3-Code-Reasoning-4B-GGUF

Collection including GetSoloTech/Qwen3-Code-Reasoning-4B-GGUF