--- license: apache-2.0 datasets: - GetSoloTech/Code-Reasoning language: - en base_model: - GetSoloTech/Qwen3-Code-Reasoning-4B pipeline_tag: text-generation tags: - coding - reasoning - problem-solving - algorithms - python - c++ --- # GetSoloTech/Qwen3-Code-Reasoning-4B-GGUF This is the GGUF quantized version of the [Qwen3-Code-Reasoning-4B](https://huggingface.co/GetSoloTech/Qwen3-Code-Reasoning-4B) model, specifically optimized for competitive programming and code reasoning tasks. This model has been trained on the high-quality Code-Reasoning dataset to enhance its capabilities in solving complex programming problems with detailed reasoning. ## 🚀 Key Features * **Enhanced Code Reasoning**: Specifically trained on competitive programming problems * **Thinking Capabilities**: Inherits the advanced reasoning capabilities from the base model * **High-Quality Solutions**: Trained on solutions with ≥85% test case pass rates * **Structured Output**: Optimized for generating well-reasoned programming solutions * **Efficient Inference**: GGUF format enables fast inference on CPU and GPU * **Multiple Quantization Levels**: Available in various precision levels for different hardware requirements ### Dataset Statistics * **Split**: Python * **Source**: High-quality competitive programming problems from TACO, APPS, CodeContests, and Codeforces * **Quality Filter**: Only correctly solved problems with ≥85% test case pass rates ## 🔧 Usage ### Using with llama.cpp ```bash # Download the model (choose your preferred quantization) wget https://huggingface.co/GetSoloTech/Qwen3-Code-Reasoning-4B-GGUF/resolve/main/qwen3-code-reasoning-4b.Q4_K_M.gguf # Run inference ./llama.cpp -m qwen3-code-reasoning-4b.Q4_K_M.gguf -n 4096 --repeat_penalty 1.1 -p "You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful.\n\nProblem: Your programming problem here..." ``` ### Using with Python (llama-cpp-python) ```python from llama_cpp import Llama # Load the model llm = Llama( model_path="./qwen3-code-reasoning-4b.Q4_K_M.gguf", n_ctx=4096, n_threads=4 ) # Prepare input for competitive programming problem prompt = """You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful. Problem: Your programming problem here...""" # Generate solution output = llm( prompt, max_tokens=4096, temperature=0.7, top_p=0.8, top_k=20, repeat_penalty=1.1 ) print(output['choices'][0]['text']) ``` ### Using with Ollama ```bash # Create a Modelfile cat > Modelfile << EOF FROM ./qwen3-code-reasoning-4b.Q4_K_M.gguf TEMPLATE """{{ if .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}{{ if .Prompt }}<|im_start|>user {{ .Prompt }}<|im_end|> {{ end }}<|im_start|>assistant """ PARAMETER temperature 0.7 PARAMETER top_p 0.8 PARAMETER top_k 20 PARAMETER repeat_penalty 1.1 EOF # Create and run the model ollama create qwen3-code-reasoning -f Modelfile ollama run qwen3-code-reasoning "Solve this competitive programming problem: [your problem here]" ``` ## 📊 Available Quantizations | Quantization | Size | Memory Usage | Quality | Use Case | |--------------|------|--------------|---------|----------| | Q3_K_M | 2.08 GB | ~3 GB | Good | CPU inference, limited memory | | Q4_K_M | 2.5 GB | ~4 GB | Better | Balanced performance/memory | | Q5_K_M | 2.89 GB | ~5 GB | Very Good | High quality, moderate memory | | Q6_K | 3.31 GB | ~6 GB | Excellent | High quality, more memory | | Q8_0 | 4.28 GB | ~8 GB | Best | Maximum quality, high memory | | F16 | 8.05 GB | ~16 GB | Original | Maximum quality, GPU recommended | ## 📈 Performance Expectations This GGUF quantized model maintains the performance characteristics of the original finetuned model: * **Competitive Programming Problems**: Better understanding of problem constraints and requirements * **Code Generation**: More accurate and efficient solutions * **Reasoning Quality**: Enhanced step-by-step reasoning for complex problems * **Solution Completeness**: More comprehensive solutions with proper edge case handling ## 🎛️ Recommended Settings ### For Code Generation * **Temperature**: 0.7 * **Top-p**: 0.8 * **Top-k**: 20 * **Max New Tokens**: 4096 (adjust based on problem complexity) * **Repeat Penalty**: 1.1 ### For Reasoning Tasks * **Temperature**: 0.6 * **Top-p**: 0.95 * **Top-k**: 20 * **Max New Tokens**: 8192 (for complex reasoning) * **Repeat Penalty**: 1.1 ## 🛠️ Hardware Requirements ### Minimum Requirements * **RAM**: 4 GB (for Q3_K_M quantization) * **Storage**: 2.5 GB free space * **CPU**: Multi-core processor recommended ### Recommended Requirements * **RAM**: 8 GB or more * **Storage**: 5 GB free space * **GPU**: NVIDIA GPU with 4GB+ VRAM (optional, for faster inference) ## 🤝 Contributing This GGUF model was converted from the original LoRA-finetuned model. For questions about: * The original model: [GetSoloTech/Qwen3-Code-Reasoning-4B](https://huggingface.co/GetSoloTech/Qwen3-Code-Reasoning-4B) * The base model: [Qwen3 GitHub](https://github.com/QwenLM/Qwen3) * The training dataset: [Code-Reasoning Repository](https://huggingface.co/datasets/GetSoloTech/Code-Reasoning) * The training framework: [Unsloth Documentation](https://github.com/unslothai/unsloth) ## 📄 License This model follows the same license as the base model (Apache 2.0). Please refer to the base model license for details. ## 🙏 Acknowledgments * **Qwen Team** for the excellent base model * **Unsloth Team** for the efficient training framework * **NVIDIA Research** for the original OpenCodeReasoning-2 dataset * **llama.cpp community** for the GGUF format and tools ## 📞 Contact For questions about this GGUF model, please open an issue in the repository. --- **Note**: This model is specifically optimized for competitive programming and code reasoning tasks. The GGUF format enables efficient inference on various hardware configurations while maintaining the model's reasoning capabilities.