|
--- |
|
datasets: |
|
- GetSoloTech/Code-Reasoning |
|
language: |
|
- en |
|
base_model: |
|
- GetSoloTech/GPT-OSS-Code-Reasoning-20B |
|
pipeline_tag: text-generation |
|
tags: |
|
- coding |
|
- reasoning |
|
- problem-solving |
|
- algorithms |
|
- python |
|
- c++ |
|
- code-reasoning |
|
- competitive-programming |
|
--- |
|
|
|
# GPT-OSS-Code-Reasoning-20B-GGUF |
|
|
|
<img src="gpt-oss-reasoning.png" width="700"/> |
|
|
|
This is the GGUF quantized version of the [GPT-OSS-Code-Reasoning-20B](https://huggingface.co/GetSoloTech/GPT-OSS-Code-Reasoning-20B) model, optimized for efficient inference with reduced memory requirements. |
|
|
|
## Overview |
|
|
|
- **Base model**: `openai/gpt-oss-20b` |
|
- **Objective**: Supervised fine-tuning for competitive programming and algorithmic reasoning |
|
- **Format**: GGUF (optimized for llama.cpp and compatible inference engines) |
|
|
|
## Model Variants |
|
|
|
This GGUF model is available in multiple quantization levels to suit different hardware requirements: |
|
|
|
| Quantization | Size | Memory Usage | Quality | |
|
|--------------|------|--------------|---------| |
|
| Q3_K_M | 12.9 GB | ~13 GB | Average | |
|
| Q4_K_M | 15.8 GB | ~16 GB | Good | |
|
| Q5_K_M | 16.9 GB | ~17 GB | Better | |
|
| Q8_0 | 22.3 GB | ~23 GB | Best | |
|
|
|
## Intended Use |
|
|
|
- **Intended**: Generating Python/C++ solutions and reasoning for competitive programming tasks |
|
- **Out of scope**: Safety-critical applications. May hallucinate or produce incorrect/inefficient code |
|
|
|
## Quick Start |
|
|
|
### Using llama.cpp |
|
|
|
```bash |
|
# Download the model |
|
wget https://huggingface.co/GetSoloTech/GPT-OSS-Code-Reasoning-20B-GGUF/resolve/main/gpt-oss-code-reasoning-20b.Q4_K_M.gguf |
|
|
|
# Run inference |
|
./llama.cpp -m gpt-oss-code-reasoning-20b.Q4_K_M.gguf -n 512 --repeat_penalty 1.1 |
|
``` |
|
|
|
### Using Python with llama-cpp-python |
|
|
|
```python |
|
from llama_cpp import Llama |
|
|
|
# Load the model |
|
llm = Llama( |
|
model_path="./gpt-oss-code-reasoning-20b.Q4_K_M.gguf", |
|
n_ctx=4096, |
|
n_threads=8 |
|
) |
|
|
|
# Example problem |
|
problem_text = """ |
|
You are given an array of integers nums and an integer target. |
|
Return indices of the two numbers such that they add up to target. |
|
""" |
|
|
|
# Create the prompt |
|
prompt = f"""<|im_start|>system |
|
You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful. |
|
<|im_end|> |
|
<|im_start|>user |
|
{problem_text} |
|
<|im_end|> |
|
<|im_start|>assistant |
|
""" |
|
|
|
# Generate response |
|
output = llm( |
|
prompt, |
|
max_tokens=768, |
|
temperature=0.3, |
|
top_p=0.9, |
|
repeat_penalty=1.1, |
|
stop=["<|im_end|>"] |
|
) |
|
|
|
print(output['choices'][0]['text']) |
|
``` |
|
|
|
### Using Ollama |
|
|
|
```bash |
|
# Create a Modelfile |
|
cat > Modelfile << EOF |
|
FROM ./gpt-oss-code-reasoning-20b.Q4_K_M.gguf |
|
TEMPLATE """<|im_start|>system |
|
{{ .System }} |
|
<|im_end|> |
|
<|im_start|>user |
|
{{ .Prompt }} |
|
<|im_end|> |
|
<|im_start|>assistant |
|
""" |
|
PARAMETER temperature 0.3 |
|
PARAMETER top_p 0.9 |
|
PARAMETER repeat_penalty 1.1 |
|
EOF |
|
|
|
# Create and run the model |
|
ollama create code-reasoning -f Modelfile |
|
ollama run code-reasoning "Solve this competitive programming problem: [your problem here]" |
|
``` |
|
|
|
## Prompt Format |
|
|
|
This model was trained in a chat format. Recommended structure: |
|
|
|
```python |
|
messages = [ |
|
{"role": "system", "content": "You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful."}, |
|
{"role": "user", "content": problem_text}, |
|
] |
|
``` |
|
|
|
For GGUF models, use the following format: |
|
|
|
``` |
|
<|im_start|>system |
|
You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful. |
|
<|im_end|> |
|
<|im_start|>user |
|
{problem_text} |
|
<|im_end|> |
|
<|im_start|>assistant |
|
``` |
|
|
|
## Generation Tips |
|
|
|
- **Reasoning style**: Lower temperature (0.2β0.5) for clearer step-by-step reasoning |
|
- **Length**: Use `max_tokens` 512β1024 for full solutions; shorter for hints |
|
- **Stop tokens**: The model uses `<|im_end|>` as a stop token |
|
- **Memory optimization**: Choose the appropriate quantization level based on your hardware |
|
|
|
## Hardware Requirements |
|
|
|
| Quantization | Minimum RAM | Recommended RAM | GPU VRAM | |
|
|--------------|-------------|-----------------|----------| |
|
| Q3_K_M | 8 GB | 16 GB | 8 GB | |
|
| Q4_K_M | 12 GB | 24 GB | 12 GB | |
|
| Q5_K_M | 16 GB | 32 GB | 16 GB | |
|
| Q8_0 | 24 GB | 48 GB | 24 GB | |
|
|
|
## Performance Notes |
|
|
|
- **Speed**: GGUF models are optimized for fast inference |
|
- **Memory**: Significantly reduced memory footprint compared to the original model |
|
- **Quality**: Minimal quality loss with appropriate quantization levels |
|
- **Compatibility**: Works with llama.cpp, llama-cpp-python, Ollama, and other GGUF-compatible engines |
|
|
|
|
|
## Acknowledgements |
|
|
|
- Original model: [GetSoloTech/GPT-OSS-Code-Reasoning-20B](https://huggingface.co/GetSoloTech/GPT-OSS-Code-Reasoning-20B) |
|
- Base model: `openai/gpt-oss-20b` |
|
- Dataset: `nvidia/OpenCodeReasoning-2` |
|
- Upstream benchmarks: TACO, APPS, DeepMind CodeContests, `open-r1/codeforces` |