File size: 4,990 Bytes
b45c449 1d87b6c b45c449 6e4fcfc b45c449 1d87b6c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 |
---
datasets:
- GetSoloTech/Code-Reasoning
language:
- en
base_model:
- GetSoloTech/GPT-OSS-Code-Reasoning-20B
pipeline_tag: text-generation
tags:
- coding
- reasoning
- problem-solving
- algorithms
- python
- c++
- code-reasoning
- competitive-programming
---
# GPT-OSS-Code-Reasoning-20B-GGUF
<img src="gpt-oss-reasoning.png" width="700"/>
This is the GGUF quantized version of the [GPT-OSS-Code-Reasoning-20B](https://huggingface.co/GetSoloTech/GPT-OSS-Code-Reasoning-20B) model, optimized for efficient inference with reduced memory requirements.
## Overview
- **Base model**: `openai/gpt-oss-20b`
- **Objective**: Supervised fine-tuning for competitive programming and algorithmic reasoning
- **Format**: GGUF (optimized for llama.cpp and compatible inference engines)
## Model Variants
This GGUF model is available in multiple quantization levels to suit different hardware requirements:
| Quantization | Size | Memory Usage | Quality |
|--------------|------|--------------|---------|
| Q3_K_M | 12.9 GB | ~13 GB | Average |
| Q4_K_M | 15.8 GB | ~16 GB | Good |
| Q5_K_M | 16.9 GB | ~17 GB | Better |
| Q8_0 | 22.3 GB | ~23 GB | Best |
## Intended Use
- **Intended**: Generating Python/C++ solutions and reasoning for competitive programming tasks
- **Out of scope**: Safety-critical applications. May hallucinate or produce incorrect/inefficient code
## Quick Start
### Using llama.cpp
```bash
# Download the model
wget https://huggingface.co/GetSoloTech/GPT-OSS-Code-Reasoning-20B-GGUF/resolve/main/gpt-oss-code-reasoning-20b.Q4_K_M.gguf
# Run inference
./llama.cpp -m gpt-oss-code-reasoning-20b.Q4_K_M.gguf -n 512 --repeat_penalty 1.1
```
### Using Python with llama-cpp-python
```python
from llama_cpp import Llama
# Load the model
llm = Llama(
model_path="./gpt-oss-code-reasoning-20b.Q4_K_M.gguf",
n_ctx=4096,
n_threads=8
)
# Example problem
problem_text = """
You are given an array of integers nums and an integer target.
Return indices of the two numbers such that they add up to target.
"""
# Create the prompt
prompt = f"""<|im_start|>system
You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful.
<|im_end|>
<|im_start|>user
{problem_text}
<|im_end|>
<|im_start|>assistant
"""
# Generate response
output = llm(
prompt,
max_tokens=768,
temperature=0.3,
top_p=0.9,
repeat_penalty=1.1,
stop=["<|im_end|>"]
)
print(output['choices'][0]['text'])
```
### Using Ollama
```bash
# Create a Modelfile
cat > Modelfile << EOF
FROM ./gpt-oss-code-reasoning-20b.Q4_K_M.gguf
TEMPLATE """<|im_start|>system
{{ .System }}
<|im_end|>
<|im_start|>user
{{ .Prompt }}
<|im_end|>
<|im_start|>assistant
"""
PARAMETER temperature 0.3
PARAMETER top_p 0.9
PARAMETER repeat_penalty 1.1
EOF
# Create and run the model
ollama create code-reasoning -f Modelfile
ollama run code-reasoning "Solve this competitive programming problem: [your problem here]"
```
## Prompt Format
This model was trained in a chat format. Recommended structure:
```python
messages = [
{"role": "system", "content": "You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful."},
{"role": "user", "content": problem_text},
]
```
For GGUF models, use the following format:
```
<|im_start|>system
You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful.
<|im_end|>
<|im_start|>user
{problem_text}
<|im_end|>
<|im_start|>assistant
```
## Generation Tips
- **Reasoning style**: Lower temperature (0.2β0.5) for clearer step-by-step reasoning
- **Length**: Use `max_tokens` 512β1024 for full solutions; shorter for hints
- **Stop tokens**: The model uses `<|im_end|>` as a stop token
- **Memory optimization**: Choose the appropriate quantization level based on your hardware
## Hardware Requirements
| Quantization | Minimum RAM | Recommended RAM | GPU VRAM |
|--------------|-------------|-----------------|----------|
| Q3_K_M | 8 GB | 16 GB | 8 GB |
| Q4_K_M | 12 GB | 24 GB | 12 GB |
| Q5_K_M | 16 GB | 32 GB | 16 GB |
| Q8_0 | 24 GB | 48 GB | 24 GB |
## Performance Notes
- **Speed**: GGUF models are optimized for fast inference
- **Memory**: Significantly reduced memory footprint compared to the original model
- **Quality**: Minimal quality loss with appropriate quantization levels
- **Compatibility**: Works with llama.cpp, llama-cpp-python, Ollama, and other GGUF-compatible engines
## Acknowledgements
- Original model: [GetSoloTech/GPT-OSS-Code-Reasoning-20B](https://huggingface.co/GetSoloTech/GPT-OSS-Code-Reasoning-20B)
- Base model: `openai/gpt-oss-20b`
- Dataset: `nvidia/OpenCodeReasoning-2`
- Upstream benchmarks: TACO, APPS, DeepMind CodeContests, `open-r1/codeforces` |