Aswith77's picture
README.md
58c4052 verified
---
library_name: transformers
tags:
- trl
- sft
- code
- competitive-programming
- codeforces
- lora
license: mit
datasets:
- open-r1/codeforces-cots
base_model:
- google/gemma-2-2b-it
pipeline_tag: text-generation
---
# Gemma-2-2b Fine-tuned for Competitive Programming
This model is a fine-tuned version of [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it) on the [open-r1/codeforces-cots](https://huggingface.co/datasets/open-r1/codeforces-cots) dataset for competitive programming problem solving.
## Model Details
### Model Description
This model has been fine-tuned using LoRA (Low-Rank Adaptation) on competitive programming problems from Codeforces. It's designed to help generate solutions for algorithmic and data structure problems commonly found in competitive programming contests.
- **Developed by:** Aswith77
- **Model type:** Causal Language Model (Code Generation)
- **Language(s):** Python, C++, Java (primarily Python)
- **License:** MIT
- **Finetuned from model:** google/gemma-2-2b-it
- **Fine-tuning method:** LoRA (Low-Rank Adaptation)
### Model Sources
- **Repository:** [Hugging Face Model](https://huggingface.co/Aswith77/gemma-2-2b-it-finetune-codeforces-cots)
- **Base Model:** [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it)
- **Dataset:** [open-r1/codeforces-cots](https://huggingface.co/datasets/open-r1/codeforces-cots)
## Uses
### Direct Use
This model is intended for generating solutions to competitive programming problems, particularly those similar to Codeforces problems. It can:
- Generate algorithmic solutions for given problem statements
- Help with code completion for competitive programming
- Assist in learning algorithmic problem-solving patterns
### Downstream Use
The model can be further fine-tuned on:
- Specific programming languages
- Domain-specific algorithmic problems
- Educational coding platforms
### Out-of-Scope Use
This model should not be used for:
- Production code without thorough testing
- Security-critical applications
- General-purpose software development without validation
- Problems requiring real-world system design
## How to Get Started with the Model
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(
"google/gemma-2-2b-it",
torch_dtype=torch.float16,
device_map="auto"
)
# Load the fine-tuned LoRA adapters
model = PeftModel.from_pretrained(
base_model,
"Aswith77/gemma-2-2b-it-finetune-codeforces-cots"
)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it")
# Generate code for a problem
problem = """
Given an array of integers, find the maximum sum of a contiguous subarray.
Input: [-2,1,-3,4,-1,2,1,-5,4]
Output: 6 (subarray [4,-1,2,1])
"""
inputs = tokenizer(problem, return_tensors="pt")
outputs = model.generate(
**inputs,
max_length=512,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
solution = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(solution)
```
## Training Details
### Training Data
The model was trained on the [open-r1/codeforces-cots](https://huggingface.co/datasets/open-r1/codeforces-cots) dataset, specifically using 1,000 competitive programming problems and their solutions from Codeforces.
### Training Procedure
#### Training Hyperparameters
- **Training regime:** fp16 mixed precision
- **Learning rate:** 2e-4
- **Batch size:** 1 (per device)
- **Gradient accumulation steps:** 2
- **Max steps:** 100
- **Warmup steps:** 5
- **Optimizer:** AdamW 8-bit
- **Weight decay:** 0.01
- **LoRA rank (r):** 16
- **LoRA alpha:** 32
- **LoRA dropout:** 0.1
- **Target modules:** q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
#### Speeds, Sizes, Times
- **Training time:** ~20 minutes
- **Hardware:** Tesla T4 GPU (16GB)
- **Model size:** ~30MB (LoRA adapters only)
- **Final training loss:** 0.3715
- **Training samples per second:** 0.338
## Evaluation
The model achieved a training loss of 0.3715, showing good convergence from an initial loss of 0.9303. The loss curve demonstrated steady improvement throughout training without signs of overfitting.
### Training Loss Progression
- Initial loss: 0.9303
- Final loss: 0.3715
- Loss reduction: ~60%
## Bias, Risks, and Limitations
### Limitations
- **Dataset bias:** Trained primarily on Codeforces problems, may not generalize well to other competitive programming platforms
- **Language bias:** Solutions may favor certain programming patterns common in the training data
- **Size limitations:** Being a 2B parameter model, it may struggle with very complex algorithmic problems
- **Code correctness:** Generated code should always be tested and validated before use
### Recommendations
- Always test generated solutions with multiple test cases
- Use the model as a starting point, not a final solution
- Verify algorithmic correctness and time complexity
- Consider the model's suggestions as one approach among many possible solutions
## Environmental Impact
Training was conducted on a single Tesla T4 GPU for approximately 20 minutes, resulting in minimal environmental impact compared to larger scale training runs.
- **Hardware Type:** Tesla T4 GPU
- **Hours used:** 0.33 hours
- **Cloud Provider:** Kaggle
- **Compute Region:** Not specified
- **Carbon Emitted:** Minimal (estimated < 0.1 kg CO2eq)
## Technical Specifications
### Model Architecture and Objective
- **Base Architecture:** Gemma-2 (2B parameters)
- **Fine-tuning Method:** LoRA (Low-Rank Adaptation)
- **Objective:** Causal language modeling with supervised fine-tuning
- **Quantization:** 4-bit quantization during training
### Compute Infrastructure
#### Hardware
- **GPU:** Tesla T4 (16GB VRAM)
- **Platform:** Kaggle Notebooks
#### Software
- **Framework:** PyTorch, Transformers, PEFT, TRL
- **Quantization:** bitsandbytes 4-bit
- **Training:** Supervised Fine-Tuning (SFT)
## Model Card Authors
Created by Aswith77 during fine-tuning experiments with competitive programming datasets.
## Model Card Contact
For questions or issues regarding this model, please open an issue in the model repository.