zeeshaan-ai's picture
Create README.md
7f88419 verified
metadata
datasets:
  - GetSoloTech/Code-Reasoning
base_model:
  - GetSoloTech/Gemma3-Code-Reasoning-4B
pipeline_tag: text-generation
tags:
  - coding
  - reasoning
  - problem-solving
  - algorithms
  - python
  - c++
  - code-reasoning
  - competitive-programming

Gemma3-Code-Reasoning-4B-GGUF

This repository contains GGUF (GGML Universal Format) quantized versions of the GetSoloTech/Gemma3-Code-Reasoning-4B model, optimized for local inference with various quantization levels to balance performance and resource usage.

🎯 Model Overview

This is a LoRA-finetuned version of gemma-3-4b-it specifically optimized for competitive programming and code reasoning tasks. The model has been trained on the high-quality Code-Reasoning dataset to enhance its capabilities in solving complex programming problems with detailed reasoning.

πŸš€ Key Features

  • Enhanced Code Reasoning: Specifically trained on competitive programming problems
  • Thinking Capabilities: Inherits the advanced reasoning capabilities from the base model
  • High-Quality Solutions: Trained on solutions with β‰₯85% test case pass rates
  • Structured Output: Optimized for generating well-reasoned programming solutions
  • Efficient Training: Uses LoRA adapters for efficient parameter updates
  • Multiple Quantization Levels: Available in various GGUF formats for different hardware capabilities

πŸ“ Available GGUF Models

Model File Size Quantization Use Case
Gemma3-Code-Reasoning-4B.f16.gguf 7.77 GB FP16 Highest quality, requires more VRAM
Gemma3-Code-Reasoning-4B.Q8_0.gguf 4.13 GB Q8_0 High quality, good balance
Gemma3-Code-Reasoning-4B.Q6_K.gguf 3.19 GB Q6_K Good quality, moderate VRAM usage
Gemma3-Code-Reasoning-4B.Q5_K_M.gguf 2.83 GB Q5_K_M Balanced quality and size
Gemma3-Code-Reasoning-4B.Q4_K_M.gguf 2.49 GB Q4_K_M Good compression, reasonable quality
Gemma3-Code-Reasoning-4B.Q3_K_M.gguf 2.1 GB Q3_K_M Smaller size, moderate quality
Gemma3-Code-Reasoning-4B.Q2_K.gguf 1.73 GB Q2_K Smallest size, basic quality
Gemma3-Code-Reasoning-4B.IQ4_XS.gguf 2.28 GB IQ4_XS Intel optimized, good quality

πŸ”§ Usage

Using with llama.cpp

# Download a GGUF model file
wget https://huggingface.co/GetSoloTech/Gemma3-Code-Reasoning-4B-GGUF/resolve/main/Gemma3-Code-Reasoning-4B.Q4_K_M.gguf

# Run inference with llama.cpp
./llama.cpp/main -m Gemma3-Code-Reasoning-4B.Q4_K_M.gguf -n 4096 --repeat_penalty 1.1 -p "You are an expert competitive programmer. Solve this problem: [YOUR_PROBLEM_HERE]"

Using with Python (llama-cpp-python)

from llama_cpp import Llama

# Load the model
llm = Llama(
    model_path="./Gemma3-Code-Reasoning-4B.Q4_K_M.gguf",
    n_ctx=4096,
    n_threads=4
)

# Prepare the prompt
prompt = """You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful.

Problem: [YOUR_PROGRAMMING_PROBLEM_HERE]

Solution:"""

# Generate response
output = llm(
    prompt,
    max_tokens=4096,
    temperature=1.0,
    top_p=0.95,
    top_k=64,
    repeat_penalty=1.1
)

print(output['choices'][0]['text'])

πŸŽ›οΈ Recommended Settings

  • Temperature: 1.0
  • Top-p: 0.95
  • Top-k: 64
  • Max New Tokens: 4096 (adjust based on problem complexity)
  • Repeat Penalty: 1.1

πŸ’» Hardware Requirements

Quantization Minimum VRAM Recommended VRAM CPU RAM
FP16 8 GB 12 GB 16 GB
Q8_0 5 GB 8 GB 12 GB
Q6_K 4 GB 6 GB 10 GB
Q5_K_M 3 GB 5 GB 8 GB
Q4_K_M 3 GB 4 GB 6 GB
Q3_K_M 2 GB 3 GB 4 GB
Q2_K 2 GB 2 GB 3 GB
IQ4_XS 3 GB 4 GB 6 GB

πŸ“ˆ Performance Expectations

This finetuned model is expected to show improved performance on:

  • Competitive Programming Problems: Better understanding of problem constraints and requirements
  • Code Generation: More accurate and efficient solutions
  • Reasoning Quality: Enhanced step-by-step reasoning for complex problems
  • Solution Completeness: More comprehensive solutions with proper edge case handling

πŸ”— Related Resources

🀝 Contributing

This model was created using the Unsloth framework and the Code-Reasoning dataset. For questions about:

πŸ™ Acknowledgments

  • Gemma Team for the excellent base model
  • Unsloth Team for the efficient training framework
  • NVIDIA Research for the original OpenCodeReasoning-2 dataset
  • llama.cpp community for the GGUF format and tools

πŸ“ž Contact

For questions about this GGUF converted model, please open an issue in the repository.


Note: This model is specifically optimized for competitive programming and code reasoning tasks. Choose the appropriate quantization level based on your hardware capabilities and quality requirements.