---
library_name: transformers
tags:
- trl
- sft
- code
- competitive-programming
- codeforces
- lora
license: mit
datasets:
- open-r1/codeforces-cots
base_model:
- google/gemma-2-2b-it
pipeline_tag: text-generation
---

# Gemma-2-2b Fine-tuned for Competitive Programming

This model is a fine-tuned version of [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it) on the [open-r1/codeforces-cots](https://huggingface.co/datasets/open-r1/codeforces-cots) dataset for competitive programming problem solving.

## Model Details

### Model Description

This model has been fine-tuned using LoRA (Low-Rank Adaptation) on competitive programming problems from Codeforces. It's designed to help generate solutions for algorithmic and data structure problems commonly found in competitive programming contests.

- **Developed by:** Aswith77
- **Model type:** Causal Language Model (Code Generation)
- **Language(s):** Python, C++, Java (primarily Python)
- **License:** MIT
- **Finetuned from model:** google/gemma-2-2b-it
- **Fine-tuning method:** LoRA (Low-Rank Adaptation)

### Model Sources

- **Repository:** [Hugging Face Model](https://huggingface.co/Aswith77/gemma-2-2b-it-finetune-codeforces-cots)
- **Base Model:** [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it)
- **Dataset:** [open-r1/codeforces-cots](https://huggingface.co/datasets/open-r1/codeforces-cots)

## Uses

### Direct Use

This model is intended for generating solutions to competitive programming problems, particularly those similar to Codeforces problems. It can:

- Generate algorithmic solutions for given problem statements
- Help with code completion for competitive programming
- Assist in learning algorithmic problem-solving patterns

### Downstream Use

The model can be further fine-tuned on:
- Specific programming languages
- Domain-specific algorithmic problems
- Educational coding platforms

### Out-of-Scope Use

This model should not be used for:
- Production code without thorough testing
- Security-critical applications
- General-purpose software development without validation
- Problems requiring real-world system design

## How to Get Started with the Model

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-2b-it",
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load the fine-tuned LoRA adapters
model = PeftModel.from_pretrained(
    base_model,
    "Aswith77/gemma-2-2b-it-finetune-codeforces-cots"
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it")

# Generate code for a problem
problem = """
Given an array of integers, find the maximum sum of a contiguous subarray.
Input: [-2,1,-3,4,-1,2,1,-5,4]
Output: 6 (subarray [4,-1,2,1])
"""

inputs = tokenizer(problem, return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_length=512,
    temperature=0.7,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)

solution = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(solution)
```

## Training Details

### Training Data

The model was trained on the [open-r1/codeforces-cots](https://huggingface.co/datasets/open-r1/codeforces-cots) dataset, specifically using 1,000 competitive programming problems and their solutions from Codeforces.

### Training Procedure

#### Training Hyperparameters

- **Training regime:** fp16 mixed precision
- **Learning rate:** 2e-4
- **Batch size:** 1 (per device)
- **Gradient accumulation steps:** 2
- **Max steps:** 100
- **Warmup steps:** 5
- **Optimizer:** AdamW 8-bit
- **Weight decay:** 0.01
- **LoRA rank (r):** 16
- **LoRA alpha:** 32
- **LoRA dropout:** 0.1
- **Target modules:** q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj

#### Speeds, Sizes, Times

- **Training time:** ~20 minutes
- **Hardware:** Tesla T4 GPU (16GB)
- **Model size:** ~30MB (LoRA adapters only)
- **Final training loss:** 0.3715
- **Training samples per second:** 0.338

## Evaluation

The model achieved a training loss of 0.3715, showing good convergence from an initial loss of 0.9303. The loss curve demonstrated steady improvement throughout training without signs of overfitting.

### Training Loss Progression

- Initial loss: 0.9303
- Final loss: 0.3715
- Loss reduction: ~60%

## Bias, Risks, and Limitations

### Limitations

- **Dataset bias:** Trained primarily on Codeforces problems, may not generalize well to other competitive programming platforms
- **Language bias:** Solutions may favor certain programming patterns common in the training data
- **Size limitations:** Being a 2B parameter model, it may struggle with very complex algorithmic problems
- **Code correctness:** Generated code should always be tested and validated before use

### Recommendations

- Always test generated solutions with multiple test cases
- Use the model as a starting point, not a final solution
- Verify algorithmic correctness and time complexity
- Consider the model's suggestions as one approach among many possible solutions

## Environmental Impact

Training was conducted on a single Tesla T4 GPU for approximately 20 minutes, resulting in minimal environmental impact compared to larger scale training runs.

- **Hardware Type:** Tesla T4 GPU
- **Hours used:** 0.33 hours
- **Cloud Provider:** Kaggle
- **Compute Region:** Not specified
- **Carbon Emitted:** Minimal (estimated < 0.1 kg CO2eq)

## Technical Specifications

### Model Architecture and Objective

- **Base Architecture:** Gemma-2 (2B parameters)
- **Fine-tuning Method:** LoRA (Low-Rank Adaptation)
- **Objective:** Causal language modeling with supervised fine-tuning
- **Quantization:** 4-bit quantization during training

### Compute Infrastructure

#### Hardware

- **GPU:** Tesla T4 (16GB VRAM)
- **Platform:** Kaggle Notebooks

#### Software

- **Framework:** PyTorch, Transformers, PEFT, TRL
- **Quantization:** bitsandbytes 4-bit
- **Training:** Supervised Fine-Tuning (SFT)

## Model Card Authors

Created by Aswith77 during fine-tuning experiments with competitive programming datasets.

## Model Card Contact

For questions or issues regarding this model, please open an issue in the model repository.