--- datasets: - GetSoloTech/Code-Reasoning language: - en base_model: - GetSoloTech/GPT-OSS-Code-Reasoning-20B pipeline_tag: text-generation tags: - coding - reasoning - problem-solving - algorithms - python - c++ - code-reasoning - competitive-programming --- # GPT-OSS-Code-Reasoning-20B-GGUF This is the GGUF quantized version of the [GPT-OSS-Code-Reasoning-20B](https://huggingface.co/GetSoloTech/GPT-OSS-Code-Reasoning-20B) model, optimized for efficient inference with reduced memory requirements. ## Overview - **Base model**: `openai/gpt-oss-20b` - **Objective**: Supervised fine-tuning for competitive programming and algorithmic reasoning - **Format**: GGUF (optimized for llama.cpp and compatible inference engines) ## Model Variants This GGUF model is available in multiple quantization levels to suit different hardware requirements: | Quantization | Size | Memory Usage | Quality | |--------------|------|--------------|---------| | Q3_K_M | 12.9 GB | ~13 GB | Average | | Q4_K_M | 15.8 GB | ~16 GB | Good | | Q5_K_M | 16.9 GB | ~17 GB | Better | | Q8_0 | 22.3 GB | ~23 GB | Best | ## Intended Use - **Intended**: Generating Python/C++ solutions and reasoning for competitive programming tasks - **Out of scope**: Safety-critical applications. May hallucinate or produce incorrect/inefficient code ## Quick Start ### Using llama.cpp ```bash # Download the model wget https://huggingface.co/GetSoloTech/GPT-OSS-Code-Reasoning-20B-GGUF/resolve/main/gpt-oss-code-reasoning-20b.Q4_K_M.gguf # Run inference ./llama.cpp -m gpt-oss-code-reasoning-20b.Q4_K_M.gguf -n 512 --repeat_penalty 1.1 ``` ### Using Python with llama-cpp-python ```python from llama_cpp import Llama # Load the model llm = Llama( model_path="./gpt-oss-code-reasoning-20b.Q4_K_M.gguf", n_ctx=4096, n_threads=8 ) # Example problem problem_text = """ You are given an array of integers nums and an integer target. Return indices of the two numbers such that they add up to target. """ # Create the prompt prompt = f"""<|im_start|>system You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful. <|im_end|> <|im_start|>user {problem_text} <|im_end|> <|im_start|>assistant """ # Generate response output = llm( prompt, max_tokens=768, temperature=0.3, top_p=0.9, repeat_penalty=1.1, stop=["<|im_end|>"] ) print(output['choices'][0]['text']) ``` ### Using Ollama ```bash # Create a Modelfile cat > Modelfile << EOF FROM ./gpt-oss-code-reasoning-20b.Q4_K_M.gguf TEMPLATE """<|im_start|>system {{ .System }} <|im_end|> <|im_start|>user {{ .Prompt }} <|im_end|> <|im_start|>assistant """ PARAMETER temperature 0.3 PARAMETER top_p 0.9 PARAMETER repeat_penalty 1.1 EOF # Create and run the model ollama create code-reasoning -f Modelfile ollama run code-reasoning "Solve this competitive programming problem: [your problem here]" ``` ## Prompt Format This model was trained in a chat format. Recommended structure: ```python messages = [ {"role": "system", "content": "You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful."}, {"role": "user", "content": problem_text}, ] ``` For GGUF models, use the following format: ``` <|im_start|>system You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful. <|im_end|> <|im_start|>user {problem_text} <|im_end|> <|im_start|>assistant ``` ## Generation Tips - **Reasoning style**: Lower temperature (0.2–0.5) for clearer step-by-step reasoning - **Length**: Use `max_tokens` 512–1024 for full solutions; shorter for hints - **Stop tokens**: The model uses `<|im_end|>` as a stop token - **Memory optimization**: Choose the appropriate quantization level based on your hardware ## Hardware Requirements | Quantization | Minimum RAM | Recommended RAM | GPU VRAM | |--------------|-------------|-----------------|----------| | Q3_K_M | 8 GB | 16 GB | 8 GB | | Q4_K_M | 12 GB | 24 GB | 12 GB | | Q5_K_M | 16 GB | 32 GB | 16 GB | | Q8_0 | 24 GB | 48 GB | 24 GB | ## Performance Notes - **Speed**: GGUF models are optimized for fast inference - **Memory**: Significantly reduced memory footprint compared to the original model - **Quality**: Minimal quality loss with appropriate quantization levels - **Compatibility**: Works with llama.cpp, llama-cpp-python, Ollama, and other GGUF-compatible engines ## Acknowledgements - Original model: [GetSoloTech/GPT-OSS-Code-Reasoning-20B](https://huggingface.co/GetSoloTech/GPT-OSS-Code-Reasoning-20B) - Base model: `openai/gpt-oss-20b` - Dataset: `nvidia/OpenCodeReasoning-2` - Upstream benchmarks: TACO, APPS, DeepMind CodeContests, `open-r1/codeforces`