README.md

58c4052 verified about 1 month ago

6.29 kB

	---
	library_name: transformers
	tags:
	- trl
	- sft
	- code
	- competitive-programming
	- codeforces
	- lora
	license: mit
	datasets:
	- open-r1/codeforces-cots
	base_model:
	- google/gemma-2-2b-it
	pipeline_tag: text-generation
	---

	# Gemma-2-2b Fine-tuned for Competitive Programming

	This model is a fine-tuned version of [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it) on the [open-r1/codeforces-cots](https://huggingface.co/datasets/open-r1/codeforces-cots) dataset for competitive programming problem solving.

	## Model Details

	### Model Description

	This model has been fine-tuned using LoRA (Low-Rank Adaptation) on competitive programming problems from Codeforces. It's designed to help generate solutions for algorithmic and data structure problems commonly found in competitive programming contests.

	- Developed by: Aswith77
	- Model type: Causal Language Model (Code Generation)
	- Language(s): Python, C++, Java (primarily Python)
	- License: MIT
	- Finetuned from model: google/gemma-2-2b-it
	- Fine-tuning method: LoRA (Low-Rank Adaptation)

	### Model Sources

	- Repository: [Hugging Face Model](https://huggingface.co/Aswith77/gemma-2-2b-it-finetune-codeforces-cots)
	- Base Model: [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it)
	- Dataset: [open-r1/codeforces-cots](https://huggingface.co/datasets/open-r1/codeforces-cots)

	## Uses

	### Direct Use

	This model is intended for generating solutions to competitive programming problems, particularly those similar to Codeforces problems. It can:

	- Generate algorithmic solutions for given problem statements
	- Help with code completion for competitive programming
	- Assist in learning algorithmic problem-solving patterns

	### Downstream Use

	The model can be further fine-tuned on:
	- Specific programming languages
	- Domain-specific algorithmic problems
	- Educational coding platforms

	### Out-of-Scope Use

	This model should not be used for:
	- Production code without thorough testing
	- Security-critical applications
	- General-purpose software development without validation
	- Problems requiring real-world system design

	## How to Get Started with the Model

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel
	import torch

	# Load the base model
	base_model = AutoModelForCausalLM.from_pretrained(
	"google/gemma-2-2b-it",
	torch_dtype=torch.float16,
	device_map="auto"
	)

	# Load the fine-tuned LoRA adapters
	model = PeftModel.from_pretrained(
	base_model,
	"Aswith77/gemma-2-2b-it-finetune-codeforces-cots"
	)

	# Load tokenizer
	tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it")

	# Generate code for a problem
	problem = """
	Given an array of integers, find the maximum sum of a contiguous subarray.
	Input: [-2,1,-3,4,-1,2,1,-5,4]
	Output: 6 (subarray [4,-1,2,1])
	"""

	inputs = tokenizer(problem, return_tensors="pt")
	outputs = model.generate(
	**inputs,
	max_length=512,
	temperature=0.7,
	do_sample=True,
	pad_token_id=tokenizer.eos_token_id
	)

	solution = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(solution)
	```

	## Training Details

	### Training Data

	The model was trained on the [open-r1/codeforces-cots](https://huggingface.co/datasets/open-r1/codeforces-cots) dataset, specifically using 1,000 competitive programming problems and their solutions from Codeforces.

	### Training Procedure

	#### Training Hyperparameters

	- Training regime: fp16 mixed precision
	- Learning rate: 2e-4
	- Batch size: 1 (per device)
	- Gradient accumulation steps: 2
	- Max steps: 100
	- Warmup steps: 5
	- Optimizer: AdamW 8-bit
	- Weight decay: 0.01
	- LoRA rank (r): 16
	- LoRA alpha: 32
	- LoRA dropout: 0.1
	- Target modules: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj

	#### Speeds, Sizes, Times

	- Training time: ~20 minutes
	- Hardware: Tesla T4 GPU (16GB)
	- Model size: ~30MB (LoRA adapters only)
	- Final training loss: 0.3715
	- Training samples per second: 0.338

	## Evaluation

	The model achieved a training loss of 0.3715, showing good convergence from an initial loss of 0.9303. The loss curve demonstrated steady improvement throughout training without signs of overfitting.

	### Training Loss Progression

	- Initial loss: 0.9303
	- Final loss: 0.3715
	- Loss reduction: ~60%

	## Bias, Risks, and Limitations

	### Limitations

	- Dataset bias: Trained primarily on Codeforces problems, may not generalize well to other competitive programming platforms
	- Language bias: Solutions may favor certain programming patterns common in the training data
	- Size limitations: Being a 2B parameter model, it may struggle with very complex algorithmic problems
	- Code correctness: Generated code should always be tested and validated before use

	### Recommendations

	- Always test generated solutions with multiple test cases
	- Use the model as a starting point, not a final solution
	- Verify algorithmic correctness and time complexity
	- Consider the model's suggestions as one approach among many possible solutions

	## Environmental Impact

	Training was conducted on a single Tesla T4 GPU for approximately 20 minutes, resulting in minimal environmental impact compared to larger scale training runs.

	- Hardware Type: Tesla T4 GPU
	- Hours used: 0.33 hours
	- Cloud Provider: Kaggle
	- Compute Region: Not specified
	- Carbon Emitted: Minimal (estimated < 0.1 kg CO2eq)

	## Technical Specifications

	### Model Architecture and Objective

	- Base Architecture: Gemma-2 (2B parameters)
	- Fine-tuning Method: LoRA (Low-Rank Adaptation)
	- Objective: Causal language modeling with supervised fine-tuning
	- Quantization: 4-bit quantization during training

	### Compute Infrastructure

	#### Hardware

	- GPU: Tesla T4 (16GB VRAM)
	- Platform: Kaggle Notebooks

	#### Software

	- Framework: PyTorch, Transformers, PEFT, TRL
	- Quantization: bitsandbytes 4-bit
	- Training: Supervised Fine-Tuning (SFT)

	## Model Card Authors

	Created by Aswith77 during fine-tuning experiments with competitive programming datasets.

	## Model Card Contact

	For questions or issues regarding this model, please open an issue in the model repository.