File size: 4,990 Bytes
b45c449
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1d87b6c
 
b45c449
 
 
 
6e4fcfc
 
b45c449
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1d87b6c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
---
datasets:
- GetSoloTech/Code-Reasoning
language:
- en
base_model:
- GetSoloTech/GPT-OSS-Code-Reasoning-20B
pipeline_tag: text-generation
tags:
- coding
- reasoning
- problem-solving
- algorithms
- python
- c++
- code-reasoning
- competitive-programming
---

# GPT-OSS-Code-Reasoning-20B-GGUF

<img src="gpt-oss-reasoning.png" width="700"/>

This is the GGUF quantized version of the [GPT-OSS-Code-Reasoning-20B](https://huggingface.co/GetSoloTech/GPT-OSS-Code-Reasoning-20B) model, optimized for efficient inference with reduced memory requirements.

## Overview

- **Base model**: `openai/gpt-oss-20b`
- **Objective**: Supervised fine-tuning for competitive programming and algorithmic reasoning
- **Format**: GGUF (optimized for llama.cpp and compatible inference engines)

## Model Variants

This GGUF model is available in multiple quantization levels to suit different hardware requirements:

| Quantization | Size | Memory Usage | Quality |
|--------------|------|--------------|---------|
| Q3_K_M       | 12.9 GB | ~13 GB | Average |
| Q4_K_M       | 15.8 GB | ~16 GB | Good |
| Q5_K_M       | 16.9 GB | ~17 GB | Better |
| Q8_0         | 22.3 GB | ~23 GB | Best |

## Intended Use

- **Intended**: Generating Python/C++ solutions and reasoning for competitive programming tasks
- **Out of scope**: Safety-critical applications. May hallucinate or produce incorrect/inefficient code

## Quick Start

### Using llama.cpp

```bash
# Download the model
wget https://huggingface.co/GetSoloTech/GPT-OSS-Code-Reasoning-20B-GGUF/resolve/main/gpt-oss-code-reasoning-20b.Q4_K_M.gguf

# Run inference
./llama.cpp -m gpt-oss-code-reasoning-20b.Q4_K_M.gguf -n 512 --repeat_penalty 1.1
```

### Using Python with llama-cpp-python

```python
from llama_cpp import Llama

# Load the model
llm = Llama(
    model_path="./gpt-oss-code-reasoning-20b.Q4_K_M.gguf",
    n_ctx=4096,
    n_threads=8
)

# Example problem
problem_text = """
You are given an array of integers nums and an integer target. 
Return indices of the two numbers such that they add up to target.
"""

# Create the prompt
prompt = f"""<|im_start|>system
You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful.
<|im_end|>
<|im_start|>user
{problem_text}
<|im_end|>
<|im_start|>assistant
"""

# Generate response
output = llm(
    prompt,
    max_tokens=768,
    temperature=0.3,
    top_p=0.9,
    repeat_penalty=1.1,
    stop=["<|im_end|>"]
)

print(output['choices'][0]['text'])
```

### Using Ollama

```bash
# Create a Modelfile
cat > Modelfile << EOF
FROM ./gpt-oss-code-reasoning-20b.Q4_K_M.gguf
TEMPLATE """<|im_start|>system
{{ .System }}
<|im_end|>
<|im_start|>user
{{ .Prompt }}
<|im_end|>
<|im_start|>assistant
"""
PARAMETER temperature 0.3
PARAMETER top_p 0.9
PARAMETER repeat_penalty 1.1
EOF

# Create and run the model
ollama create code-reasoning -f Modelfile
ollama run code-reasoning "Solve this competitive programming problem: [your problem here]"
```

## Prompt Format

This model was trained in a chat format. Recommended structure:

```python
messages = [
    {"role": "system", "content": "You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful."},
    {"role": "user", "content": problem_text},
]
```

For GGUF models, use the following format:

```
<|im_start|>system
You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful.
<|im_end|>
<|im_start|>user
{problem_text}
<|im_end|>
<|im_start|>assistant
```

## Generation Tips

- **Reasoning style**: Lower temperature (0.2–0.5) for clearer step-by-step reasoning
- **Length**: Use `max_tokens` 512–1024 for full solutions; shorter for hints
- **Stop tokens**: The model uses `<|im_end|>` as a stop token
- **Memory optimization**: Choose the appropriate quantization level based on your hardware

## Hardware Requirements

| Quantization | Minimum RAM | Recommended RAM | GPU VRAM |
|--------------|-------------|-----------------|----------|
| Q3_K_M       | 8 GB        | 16 GB           | 8 GB     |
| Q4_K_M       | 12 GB       | 24 GB           | 12 GB    |
| Q5_K_M       | 16 GB       | 32 GB           | 16 GB    |
| Q8_0         | 24 GB       | 48 GB           | 24 GB    |

## Performance Notes

- **Speed**: GGUF models are optimized for fast inference
- **Memory**: Significantly reduced memory footprint compared to the original model
- **Quality**: Minimal quality loss with appropriate quantization levels
- **Compatibility**: Works with llama.cpp, llama-cpp-python, Ollama, and other GGUF-compatible engines


## Acknowledgements

- Original model: [GetSoloTech/GPT-OSS-Code-Reasoning-20B](https://huggingface.co/GetSoloTech/GPT-OSS-Code-Reasoning-20B)
- Base model: `openai/gpt-oss-20b`
- Dataset: `nvidia/OpenCodeReasoning-2`
- Upstream benchmarks: TACO, APPS, DeepMind CodeContests, `open-r1/codeforces`