File size: 5,852 Bytes
7f88419
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
---
datasets:
- GetSoloTech/Code-Reasoning
base_model:
- GetSoloTech/Gemma3-Code-Reasoning-4B
pipeline_tag: text-generation
tags:
- coding
- reasoning
- problem-solving
- algorithms
- python
- c++
- code-reasoning
- competitive-programming
---

# Gemma3-Code-Reasoning-4B-GGUF

This repository contains GGUF (GGML Universal Format) quantized versions of the [GetSoloTech/Gemma3-Code-Reasoning-4B](https://huggingface.co/GetSoloTech/Gemma3-Code-Reasoning-4B) model, optimized for local inference with various quantization levels to balance performance and resource usage.

## 🎯 Model Overview

This is a **LoRA-finetuned** version of `gemma-3-4b-it` specifically optimized for competitive programming and code reasoning tasks. The model has been trained on the high-quality Code-Reasoning dataset to enhance its capabilities in solving complex programming problems with detailed reasoning.

## πŸš€ Key Features

- **Enhanced Code Reasoning**: Specifically trained on competitive programming problems
- **Thinking Capabilities**: Inherits the advanced reasoning capabilities from the base model
- **High-Quality Solutions**: Trained on solutions with β‰₯85% test case pass rates
- **Structured Output**: Optimized for generating well-reasoned programming solutions
- **Efficient Training**: Uses LoRA adapters for efficient parameter updates
- **Multiple Quantization Levels**: Available in various GGUF formats for different hardware capabilities

## πŸ“ Available GGUF Models
| Model File | Size | Quantization | Use Case |
|------------|------|--------------|----------|
| `Gemma3-Code-Reasoning-4B.f16.gguf` | 7.77 GB | FP16 | Highest quality, requires more VRAM |
| `Gemma3-Code-Reasoning-4B.Q8_0.gguf` | 4.13 GB | Q8_0 | High quality, good balance |
| `Gemma3-Code-Reasoning-4B.Q6_K.gguf` | 3.19 GB | Q6_K | Good quality, moderate VRAM usage |
| `Gemma3-Code-Reasoning-4B.Q5_K_M.gguf` | 2.83 GB | Q5_K_M | Balanced quality and size |
| `Gemma3-Code-Reasoning-4B.Q4_K_M.gguf` | 2.49 GB | Q4_K_M | Good compression, reasonable quality |
| `Gemma3-Code-Reasoning-4B.Q3_K_M.gguf` | 2.1 GB | Q3_K_M | Smaller size, moderate quality |
| `Gemma3-Code-Reasoning-4B.Q2_K.gguf` | 1.73 GB | Q2_K | Smallest size, basic quality |
| `Gemma3-Code-Reasoning-4B.IQ4_XS.gguf` | 2.28 GB | IQ4_XS | Intel optimized, good quality |

## πŸ”§ Usage

### Using with llama.cpp

```bash
# Download a GGUF model file
wget https://huggingface.co/GetSoloTech/Gemma3-Code-Reasoning-4B-GGUF/resolve/main/Gemma3-Code-Reasoning-4B.Q4_K_M.gguf

# Run inference with llama.cpp
./llama.cpp/main -m Gemma3-Code-Reasoning-4B.Q4_K_M.gguf -n 4096 --repeat_penalty 1.1 -p "You are an expert competitive programmer. Solve this problem: [YOUR_PROBLEM_HERE]"
```

### Using with Python (llama-cpp-python)

```python
from llama_cpp import Llama

# Load the model
llm = Llama(
    model_path="./Gemma3-Code-Reasoning-4B.Q4_K_M.gguf",
    n_ctx=4096,
    n_threads=4
)

# Prepare the prompt
prompt = """You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful.

Problem: [YOUR_PROGRAMMING_PROBLEM_HERE]

Solution:"""

# Generate response
output = llm(
    prompt,
    max_tokens=4096,
    temperature=1.0,
    top_p=0.95,
    top_k=64,
    repeat_penalty=1.1
)

print(output['choices'][0]['text'])
```


## πŸŽ›οΈ Recommended Settings

- **Temperature**: 1.0
- **Top-p**: 0.95
- **Top-k**: 64
- **Max New Tokens**: 4096 (adjust based on problem complexity)
- **Repeat Penalty**: 1.1


## πŸ’» Hardware Requirements

| Quantization | Minimum VRAM | Recommended VRAM | CPU RAM |
|--------------|--------------|------------------|---------|
| FP16 | 8 GB | 12 GB | 16 GB |
| Q8_0 | 5 GB | 8 GB | 12 GB |
| Q6_K | 4 GB | 6 GB | 10 GB |
| Q5_K_M | 3 GB | 5 GB | 8 GB |
| Q4_K_M | 3 GB | 4 GB | 6 GB |
| Q3_K_M | 2 GB | 3 GB | 4 GB |
| Q2_K | 2 GB | 2 GB | 3 GB |
| IQ4_XS | 3 GB | 4 GB | 6 GB |

## πŸ“ˆ Performance Expectations

This finetuned model is expected to show improved performance on:

- **Competitive Programming Problems**: Better understanding of problem constraints and requirements
- **Code Generation**: More accurate and efficient solutions
- **Reasoning Quality**: Enhanced step-by-step reasoning for complex problems
- **Solution Completeness**: More comprehensive solutions with proper edge case handling

## πŸ”— Related Resources

- **Base Model**: [GetSoloTech/Gemma3-Code-Reasoning-4B](https://huggingface.co/GetSoloTech/Gemma3-Code-Reasoning-4B)
- **Training Dataset**: [GetSoloTech/Code-Reasoning](https://huggingface.co/datasets/GetSoloTech/Code-Reasoning)
- **Original Gemma Model**: [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it)
- **llama.cpp**: [GitHub Repository](https://github.com/ggerganov/llama.cpp)
- **llama-cpp-python**: [PyPI Package](https://pypi.org/project/llama-cpp-python/)

## 🀝 Contributing

This model was created using the Unsloth framework and the Code-Reasoning dataset. For questions about:

- The base model: [Gemma3 Huggingface](https://huggingface.co/google/gemma-3-4b-it)
- The training dataset: [Code-Reasoning Repository](https://huggingface.co/datasets/GetSoloTech/Code-Reasoning)
- The training framework: [Unsloth Documentation](https://github.com/unslothai/unsloth)

## πŸ™ Acknowledgments

- **Gemma Team** for the excellent base model
- **Unsloth Team** for the efficient training framework
- **NVIDIA Research** for the original OpenCodeReasoning-2 dataset
- **llama.cpp community** for the GGUF format and tools

## πŸ“ž Contact

For questions about this GGUF converted model, please open an issue in the repository.

---

**Note**: This model is specifically optimized for competitive programming and code reasoning tasks. Choose the appropriate quantization level based on your hardware capabilities and quality requirements.