Qwen3-Code-Reasoning-4B-GGUF / README.md

Update README.md

1bc7e21 verified 9 days ago

6.12 kB

	---
	license: apache-2.0
	datasets:
	- GetSoloTech/Code-Reasoning
	language:
	- en
	base_model:
	- GetSoloTech/Qwen3-Code-Reasoning-4B
	pipeline_tag: text-generation
	tags:
	- coding
	- reasoning
	- problem-solving
	- algorithms
	- python
	- c++
	---

	# GetSoloTech/Qwen3-Code-Reasoning-4B-GGUF

	This is the GGUF quantized version of the [Qwen3-Code-Reasoning-4B](https://huggingface.co/GetSoloTech/Qwen3-Code-Reasoning-4B) model, specifically optimized for competitive programming and code reasoning tasks. This model has been trained on the high-quality Code-Reasoning dataset to enhance its capabilities in solving complex programming problems with detailed reasoning.


	## 🚀 Key Features

	* Enhanced Code Reasoning: Specifically trained on competitive programming problems
	* Thinking Capabilities: Inherits the advanced reasoning capabilities from the base model
	* High-Quality Solutions: Trained on solutions with ≥85% test case pass rates
	* Structured Output: Optimized for generating well-reasoned programming solutions
	* Efficient Inference: GGUF format enables fast inference on CPU and GPU
	* Multiple Quantization Levels: Available in various precision levels for different hardware requirements

	### Dataset Statistics

	* Split: Python
	* Source: High-quality competitive programming problems from TACO, APPS, CodeContests, and Codeforces
	* Quality Filter: Only correctly solved problems with ≥85% test case pass rates

	## 🔧 Usage

	### Using with llama.cpp

	```bash
	# Download the model (choose your preferred quantization)
	wget https://huggingface.co/GetSoloTech/Qwen3-Code-Reasoning-4B-GGUF/resolve/main/qwen3-code-reasoning-4b.Q4_K_M.gguf

	# Run inference
	./llama.cpp -m qwen3-code-reasoning-4b.Q4_K_M.gguf -n 4096 --repeat_penalty 1.1 -p "You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful.\n\nProblem: Your programming problem here..."
	```

	### Using with Python (llama-cpp-python)

	```python
	from llama_cpp import Llama

	# Load the model
	llm = Llama(
	model_path="./qwen3-code-reasoning-4b.Q4_K_M.gguf",
	n_ctx=4096,
	n_threads=4
	)

	# Prepare input for competitive programming problem
	prompt = """You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful.

	Problem: Your programming problem here..."""

	# Generate solution
	output = llm(
	prompt,
	max_tokens=4096,
	temperature=0.7,
	top_p=0.8,
	top_k=20,
	repeat_penalty=1.1
	)

	print(output['choices'][0]['text'])
	```

	### Using with Ollama

	```bash
	# Create a Modelfile
	cat > Modelfile << EOF
	FROM ./qwen3-code-reasoning-4b.Q4_K_M.gguf
	TEMPLATE """{{ if .System }}<\|im_start\|>system
	{{ .System }}<\|im_end\|>
	{{ end }}{{ if .Prompt }}<\|im_start\|>user
	{{ .Prompt }}<\|im_end\|>
	{{ end }}<\|im_start\|>assistant
	"""
	PARAMETER temperature 0.7
	PARAMETER top_p 0.8
	PARAMETER top_k 20
	PARAMETER repeat_penalty 1.1
	EOF

	# Create and run the model
	ollama create qwen3-code-reasoning -f Modelfile
	ollama run qwen3-code-reasoning "Solve this competitive programming problem: [your problem here]"
	```

	## 📊 Available Quantizations

	\| Quantization \| Size \| Memory Usage \| Quality \| Use Case \|
	\|--------------\|------\|--------------\|---------\|----------\|
	\| Q3_K_M \| 2.08 GB \| ~3 GB \| Good \| CPU inference, limited memory \|
	\| Q4_K_M \| 2.5 GB \| ~4 GB \| Better \| Balanced performance/memory \|
	\| Q5_K_M \| 2.89 GB \| ~5 GB \| Very Good \| High quality, moderate memory \|
	\| Q6_K \| 3.31 GB \| ~6 GB \| Excellent \| High quality, more memory \|
	\| Q8_0 \| 4.28 GB \| ~8 GB \| Best \| Maximum quality, high memory \|
	\| F16 \| 8.05 GB \| ~16 GB \| Original \| Maximum quality, GPU recommended \|

	## 📈 Performance Expectations

	This GGUF quantized model maintains the performance characteristics of the original finetuned model:

	* Competitive Programming Problems: Better understanding of problem constraints and requirements
	* Code Generation: More accurate and efficient solutions
	* Reasoning Quality: Enhanced step-by-step reasoning for complex problems
	* Solution Completeness: More comprehensive solutions with proper edge case handling

	## 🎛️ Recommended Settings

	### For Code Generation

	* Temperature: 0.7
	* Top-p: 0.8
	* Top-k: 20
	* Max New Tokens: 4096 (adjust based on problem complexity)
	* Repeat Penalty: 1.1

	### For Reasoning Tasks

	* Temperature: 0.6
	* Top-p: 0.95
	* Top-k: 20
	* Max New Tokens: 8192 (for complex reasoning)
	* Repeat Penalty: 1.1

	## 🛠️ Hardware Requirements

	### Minimum Requirements
	* RAM: 4 GB (for Q3_K_M quantization)
	* Storage: 2.5 GB free space
	* CPU: Multi-core processor recommended

	### Recommended Requirements
	* RAM: 8 GB or more
	* Storage: 5 GB free space
	* GPU: NVIDIA GPU with 4GB+ VRAM (optional, for faster inference)

	## 🤝 Contributing

	This GGUF model was converted from the original LoRA-finetuned model. For questions about:

	* The original model: [GetSoloTech/Qwen3-Code-Reasoning-4B](https://huggingface.co/GetSoloTech/Qwen3-Code-Reasoning-4B)
	* The base model: [Qwen3 GitHub](https://github.com/QwenLM/Qwen3)
	* The training dataset: [Code-Reasoning Repository](https://huggingface.co/datasets/GetSoloTech/Code-Reasoning)
	* The training framework: [Unsloth Documentation](https://github.com/unslothai/unsloth)

	## 📄 License

	This model follows the same license as the base model (Apache 2.0). Please refer to the base model license for details.

	## 🙏 Acknowledgments

	* Qwen Team for the excellent base model
	* Unsloth Team for the efficient training framework
	* NVIDIA Research for the original OpenCodeReasoning-2 dataset
	* llama.cpp community for the GGUF format and tools

	## 📞 Contact

	For questions about this GGUF model, please open an issue in the repository.

	---

	Note: This model is specifically optimized for competitive programming and code reasoning tasks. The GGUF format enables efficient inference on various hardware configurations while maintaining the model's reasoning capabilities.