--- library_name: transformers tags: - trl - sft - code - competitive-programming - codeforces - lora license: mit datasets: - open-r1/codeforces-cots base_model: - google/gemma-2-2b-it pipeline_tag: text-generation --- # Gemma-2-2b Fine-tuned for Competitive Programming This model is a fine-tuned version of [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it) on the [open-r1/codeforces-cots](https://huggingface.co/datasets/open-r1/codeforces-cots) dataset for competitive programming problem solving. ## Model Details ### Model Description This model has been fine-tuned using LoRA (Low-Rank Adaptation) on competitive programming problems from Codeforces. It's designed to help generate solutions for algorithmic and data structure problems commonly found in competitive programming contests. - **Developed by:** Aswith77 - **Model type:** Causal Language Model (Code Generation) - **Language(s):** Python, C++, Java (primarily Python) - **License:** MIT - **Finetuned from model:** google/gemma-2-2b-it - **Fine-tuning method:** LoRA (Low-Rank Adaptation) ### Model Sources - **Repository:** [Hugging Face Model](https://huggingface.co/Aswith77/gemma-2-2b-it-finetune-codeforces-cots) - **Base Model:** [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it) - **Dataset:** [open-r1/codeforces-cots](https://huggingface.co/datasets/open-r1/codeforces-cots) ## Uses ### Direct Use This model is intended for generating solutions to competitive programming problems, particularly those similar to Codeforces problems. It can: - Generate algorithmic solutions for given problem statements - Help with code completion for competitive programming - Assist in learning algorithmic problem-solving patterns ### Downstream Use The model can be further fine-tuned on: - Specific programming languages - Domain-specific algorithmic problems - Educational coding platforms ### Out-of-Scope Use This model should not be used for: - Production code without thorough testing - Security-critical applications - General-purpose software development without validation - Problems requiring real-world system design ## How to Get Started with the Model ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel import torch # Load the base model base_model = AutoModelForCausalLM.from_pretrained( "google/gemma-2-2b-it", torch_dtype=torch.float16, device_map="auto" ) # Load the fine-tuned LoRA adapters model = PeftModel.from_pretrained( base_model, "Aswith77/gemma-2-2b-it-finetune-codeforces-cots" ) # Load tokenizer tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-it") # Generate code for a problem problem = """ Given an array of integers, find the maximum sum of a contiguous subarray. Input: [-2,1,-3,4,-1,2,1,-5,4] Output: 6 (subarray [4,-1,2,1]) """ inputs = tokenizer(problem, return_tensors="pt") outputs = model.generate( **inputs, max_length=512, temperature=0.7, do_sample=True, pad_token_id=tokenizer.eos_token_id ) solution = tokenizer.decode(outputs[0], skip_special_tokens=True) print(solution) ``` ## Training Details ### Training Data The model was trained on the [open-r1/codeforces-cots](https://huggingface.co/datasets/open-r1/codeforces-cots) dataset, specifically using 1,000 competitive programming problems and their solutions from Codeforces. ### Training Procedure #### Training Hyperparameters - **Training regime:** fp16 mixed precision - **Learning rate:** 2e-4 - **Batch size:** 1 (per device) - **Gradient accumulation steps:** 2 - **Max steps:** 100 - **Warmup steps:** 5 - **Optimizer:** AdamW 8-bit - **Weight decay:** 0.01 - **LoRA rank (r):** 16 - **LoRA alpha:** 32 - **LoRA dropout:** 0.1 - **Target modules:** q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj #### Speeds, Sizes, Times - **Training time:** ~20 minutes - **Hardware:** Tesla T4 GPU (16GB) - **Model size:** ~30MB (LoRA adapters only) - **Final training loss:** 0.3715 - **Training samples per second:** 0.338 ## Evaluation The model achieved a training loss of 0.3715, showing good convergence from an initial loss of 0.9303. The loss curve demonstrated steady improvement throughout training without signs of overfitting. ### Training Loss Progression - Initial loss: 0.9303 - Final loss: 0.3715 - Loss reduction: ~60% ## Bias, Risks, and Limitations ### Limitations - **Dataset bias:** Trained primarily on Codeforces problems, may not generalize well to other competitive programming platforms - **Language bias:** Solutions may favor certain programming patterns common in the training data - **Size limitations:** Being a 2B parameter model, it may struggle with very complex algorithmic problems - **Code correctness:** Generated code should always be tested and validated before use ### Recommendations - Always test generated solutions with multiple test cases - Use the model as a starting point, not a final solution - Verify algorithmic correctness and time complexity - Consider the model's suggestions as one approach among many possible solutions ## Environmental Impact Training was conducted on a single Tesla T4 GPU for approximately 20 minutes, resulting in minimal environmental impact compared to larger scale training runs. - **Hardware Type:** Tesla T4 GPU - **Hours used:** 0.33 hours - **Cloud Provider:** Kaggle - **Compute Region:** Not specified - **Carbon Emitted:** Minimal (estimated < 0.1 kg CO2eq) ## Technical Specifications ### Model Architecture and Objective - **Base Architecture:** Gemma-2 (2B parameters) - **Fine-tuning Method:** LoRA (Low-Rank Adaptation) - **Objective:** Causal language modeling with supervised fine-tuning - **Quantization:** 4-bit quantization during training ### Compute Infrastructure #### Hardware - **GPU:** Tesla T4 (16GB VRAM) - **Platform:** Kaggle Notebooks #### Software - **Framework:** PyTorch, Transformers, PEFT, TRL - **Quantization:** bitsandbytes 4-bit - **Training:** Supervised Fine-Tuning (SFT) ## Model Card Authors Created by Aswith77 during fine-tuning experiments with competitive programming datasets. ## Model Card Contact For questions or issues regarding this model, please open an issue in the model repository.