Gemma3-Code-Reasoning-4B-GGUF / README.md

Create README.md

7f88419 verified 4 days ago

5.85 kB

	---
	datasets:
	- GetSoloTech/Code-Reasoning
	base_model:
	- GetSoloTech/Gemma3-Code-Reasoning-4B
	pipeline_tag: text-generation
	tags:
	- coding
	- reasoning
	- problem-solving
	- algorithms
	- python
	- c++
	- code-reasoning
	- competitive-programming
	---

	# Gemma3-Code-Reasoning-4B-GGUF

	This repository contains GGUF (GGML Universal Format) quantized versions of the [GetSoloTech/Gemma3-Code-Reasoning-4B](https://huggingface.co/GetSoloTech/Gemma3-Code-Reasoning-4B) model, optimized for local inference with various quantization levels to balance performance and resource usage.

	## 🎯 Model Overview

	This is a LoRA-finetuned version of `gemma-3-4b-it` specifically optimized for competitive programming and code reasoning tasks. The model has been trained on the high-quality Code-Reasoning dataset to enhance its capabilities in solving complex programming problems with detailed reasoning.

	## 🚀 Key Features

	- Enhanced Code Reasoning: Specifically trained on competitive programming problems
	- Thinking Capabilities: Inherits the advanced reasoning capabilities from the base model
	- High-Quality Solutions: Trained on solutions with ≥85% test case pass rates
	- Structured Output: Optimized for generating well-reasoned programming solutions
	- Efficient Training: Uses LoRA adapters for efficient parameter updates
	- Multiple Quantization Levels: Available in various GGUF formats for different hardware capabilities

	## 📁 Available GGUF Models
	\| Model File \| Size \| Quantization \| Use Case \|
	\|------------\|------\|--------------\|----------\|
	\| `Gemma3-Code-Reasoning-4B.f16.gguf` \| 7.77 GB \| FP16 \| Highest quality, requires more VRAM \|
	\| `Gemma3-Code-Reasoning-4B.Q8_0.gguf` \| 4.13 GB \| Q8_0 \| High quality, good balance \|
	\| `Gemma3-Code-Reasoning-4B.Q6_K.gguf` \| 3.19 GB \| Q6_K \| Good quality, moderate VRAM usage \|
	\| `Gemma3-Code-Reasoning-4B.Q5_K_M.gguf` \| 2.83 GB \| Q5_K_M \| Balanced quality and size \|
	\| `Gemma3-Code-Reasoning-4B.Q4_K_M.gguf` \| 2.49 GB \| Q4_K_M \| Good compression, reasonable quality \|
	\| `Gemma3-Code-Reasoning-4B.Q3_K_M.gguf` \| 2.1 GB \| Q3_K_M \| Smaller size, moderate quality \|
	\| `Gemma3-Code-Reasoning-4B.Q2_K.gguf` \| 1.73 GB \| Q2_K \| Smallest size, basic quality \|
	\| `Gemma3-Code-Reasoning-4B.IQ4_XS.gguf` \| 2.28 GB \| IQ4_XS \| Intel optimized, good quality \|

	## 🔧 Usage

	### Using with llama.cpp

	```bash
	# Download a GGUF model file
	wget https://huggingface.co/GetSoloTech/Gemma3-Code-Reasoning-4B-GGUF/resolve/main/Gemma3-Code-Reasoning-4B.Q4_K_M.gguf

	# Run inference with llama.cpp
	./llama.cpp/main -m Gemma3-Code-Reasoning-4B.Q4_K_M.gguf -n 4096 --repeat_penalty 1.1 -p "You are an expert competitive programmer. Solve this problem: [YOUR_PROBLEM_HERE]"
	```

	### Using with Python (llama-cpp-python)

	```python
	from llama_cpp import Llama

	# Load the model
	llm = Llama(
	model_path="./Gemma3-Code-Reasoning-4B.Q4_K_M.gguf",
	n_ctx=4096,
	n_threads=4
	)

	# Prepare the prompt
	prompt = """You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful.

	Problem: [YOUR_PROGRAMMING_PROBLEM_HERE]

	Solution:"""

	# Generate response
	output = llm(
	prompt,
	max_tokens=4096,
	temperature=1.0,
	top_p=0.95,
	top_k=64,
	repeat_penalty=1.1
	)

	print(output['choices'][0]['text'])
	```


	## 🎛️ Recommended Settings

	- Temperature: 1.0
	- Top-p: 0.95
	- Top-k: 64
	- Max New Tokens: 4096 (adjust based on problem complexity)
	- Repeat Penalty: 1.1


	## 💻 Hardware Requirements

	\| Quantization \| Minimum VRAM \| Recommended VRAM \| CPU RAM \|
	\|--------------\|--------------\|------------------\|---------\|
	\| FP16 \| 8 GB \| 12 GB \| 16 GB \|
	\| Q8_0 \| 5 GB \| 8 GB \| 12 GB \|
	\| Q6_K \| 4 GB \| 6 GB \| 10 GB \|
	\| Q5_K_M \| 3 GB \| 5 GB \| 8 GB \|
	\| Q4_K_M \| 3 GB \| 4 GB \| 6 GB \|
	\| Q3_K_M \| 2 GB \| 3 GB \| 4 GB \|
	\| Q2_K \| 2 GB \| 2 GB \| 3 GB \|
	\| IQ4_XS \| 3 GB \| 4 GB \| 6 GB \|

	## 📈 Performance Expectations

	This finetuned model is expected to show improved performance on:

	- Competitive Programming Problems: Better understanding of problem constraints and requirements
	- Code Generation: More accurate and efficient solutions
	- Reasoning Quality: Enhanced step-by-step reasoning for complex problems
	- Solution Completeness: More comprehensive solutions with proper edge case handling

	## 🔗 Related Resources

	- Base Model: [GetSoloTech/Gemma3-Code-Reasoning-4B](https://huggingface.co/GetSoloTech/Gemma3-Code-Reasoning-4B)
	- Training Dataset: [GetSoloTech/Code-Reasoning](https://huggingface.co/datasets/GetSoloTech/Code-Reasoning)
	- Original Gemma Model: [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it)
	- llama.cpp: [GitHub Repository](https://github.com/ggerganov/llama.cpp)
	- llama-cpp-python: [PyPI Package](https://pypi.org/project/llama-cpp-python/)

	## 🤝 Contributing

	This model was created using the Unsloth framework and the Code-Reasoning dataset. For questions about:

	- The base model: [Gemma3 Huggingface](https://huggingface.co/google/gemma-3-4b-it)
	- The training dataset: [Code-Reasoning Repository](https://huggingface.co/datasets/GetSoloTech/Code-Reasoning)
	- The training framework: [Unsloth Documentation](https://github.com/unslothai/unsloth)

	## 🙏 Acknowledgments

	- Gemma Team for the excellent base model
	- Unsloth Team for the efficient training framework
	- NVIDIA Research for the original OpenCodeReasoning-2 dataset
	- llama.cpp community for the GGUF format and tools

	## 📞 Contact

	For questions about this GGUF converted model, please open an issue in the repository.

	---

	Note: This model is specifically optimized for competitive programming and code reasoning tasks. Choose the appropriate quantization level based on your hardware capabilities and quality requirements.