Gemma3-Code-Reasoning-4B-GGUF

This repository contains GGUF (GGML Universal Format) quantized versions of the GetSoloTech/Gemma3-Code-Reasoning-4B model, optimized for local inference with various quantization levels to balance performance and resource usage.

🎯 Model Overview

This is a LoRA-finetuned version of gemma-3-4b-it specifically optimized for competitive programming and code reasoning tasks. The model has been trained on the high-quality Code-Reasoning dataset to enhance its capabilities in solving complex programming problems with detailed reasoning.

πŸš€ Key Features

  • Enhanced Code Reasoning: Specifically trained on competitive programming problems
  • Thinking Capabilities: Inherits the advanced reasoning capabilities from the base model
  • High-Quality Solutions: Trained on solutions with β‰₯85% test case pass rates
  • Structured Output: Optimized for generating well-reasoned programming solutions
  • Efficient Training: Uses LoRA adapters for efficient parameter updates
  • Multiple Quantization Levels: Available in various GGUF formats for different hardware capabilities

πŸ“ Available GGUF Models

Model File Size Quantization Use Case
Gemma3-Code-Reasoning-4B.f16.gguf 7.77 GB FP16 Highest quality, requires more VRAM
Gemma3-Code-Reasoning-4B.Q8_0.gguf 4.13 GB Q8_0 High quality, good balance
Gemma3-Code-Reasoning-4B.Q6_K.gguf 3.19 GB Q6_K Good quality, moderate VRAM usage
Gemma3-Code-Reasoning-4B.Q5_K_M.gguf 2.83 GB Q5_K_M Balanced quality and size
Gemma3-Code-Reasoning-4B.Q4_K_M.gguf 2.49 GB Q4_K_M Good compression, reasonable quality
Gemma3-Code-Reasoning-4B.Q3_K_M.gguf 2.1 GB Q3_K_M Smaller size, moderate quality
Gemma3-Code-Reasoning-4B.Q2_K.gguf 1.73 GB Q2_K Smallest size, basic quality
Gemma3-Code-Reasoning-4B.IQ4_XS.gguf 2.28 GB IQ4_XS Intel optimized, good quality

πŸ”§ Usage

Using with llama.cpp

# Download a GGUF model file
wget https://huggingface.co/GetSoloTech/Gemma3-Code-Reasoning-4B-GGUF/resolve/main/Gemma3-Code-Reasoning-4B.Q4_K_M.gguf

# Run inference with llama.cpp
./llama.cpp/main -m Gemma3-Code-Reasoning-4B.Q4_K_M.gguf -n 4096 --repeat_penalty 1.1 -p "You are an expert competitive programmer. Solve this problem: [YOUR_PROBLEM_HERE]"

Using with Python (llama-cpp-python)

from llama_cpp import Llama

# Load the model
llm = Llama(
    model_path="./Gemma3-Code-Reasoning-4B.Q4_K_M.gguf",
    n_ctx=4096,
    n_threads=4
)

# Prepare the prompt
prompt = """You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful.

Problem: [YOUR_PROGRAMMING_PROBLEM_HERE]

Solution:"""

# Generate response
output = llm(
    prompt,
    max_tokens=4096,
    temperature=1.0,
    top_p=0.95,
    top_k=64,
    repeat_penalty=1.1
)

print(output['choices'][0]['text'])

πŸŽ›οΈ Recommended Settings

  • Temperature: 1.0
  • Top-p: 0.95
  • Top-k: 64
  • Max New Tokens: 4096 (adjust based on problem complexity)
  • Repeat Penalty: 1.1

πŸ’» Hardware Requirements

Quantization Minimum VRAM Recommended VRAM CPU RAM
FP16 8 GB 12 GB 16 GB
Q8_0 5 GB 8 GB 12 GB
Q6_K 4 GB 6 GB 10 GB
Q5_K_M 3 GB 5 GB 8 GB
Q4_K_M 3 GB 4 GB 6 GB
Q3_K_M 2 GB 3 GB 4 GB
Q2_K 2 GB 2 GB 3 GB
IQ4_XS 3 GB 4 GB 6 GB

πŸ“ˆ Performance Expectations

This finetuned model is expected to show improved performance on:

  • Competitive Programming Problems: Better understanding of problem constraints and requirements
  • Code Generation: More accurate and efficient solutions
  • Reasoning Quality: Enhanced step-by-step reasoning for complex problems
  • Solution Completeness: More comprehensive solutions with proper edge case handling

πŸ”— Related Resources

🀝 Contributing

This model was created using the Unsloth framework and the Code-Reasoning dataset. For questions about:

πŸ™ Acknowledgments

  • Gemma Team for the excellent base model
  • Unsloth Team for the efficient training framework
  • NVIDIA Research for the original OpenCodeReasoning-2 dataset
  • llama.cpp community for the GGUF format and tools

πŸ“ž Contact

For questions about this GGUF converted model, please open an issue in the repository.


Note: This model is specifically optimized for competitive programming and code reasoning tasks. Choose the appropriate quantization level based on your hardware capabilities and quality requirements.

Downloads last month
70
GGUF
Model size
3.88B params
Architecture
gemma3
Hardware compatibility
Log In to view the estimation

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for GetSoloTech/Gemma3-Code-Reasoning-4B-GGUF

Quantized
(3)
this model

Dataset used to train GetSoloTech/Gemma3-Code-Reasoning-4B-GGUF

Collection including GetSoloTech/Gemma3-Code-Reasoning-4B-GGUF