--- base_model: Qwen/Qwen2.5-Coder-0.5B-Instruct tags: - ellora - lora - long-context - repository-understanding - code-analysis - progressive-training - 2m-context - unsloth - vllm - peft library_name: peft license: apache-2.0 language: - en pipeline_tag: text-generation datasets: - codelion/Qwen2.5-Coder-0.5B-Instruct-progressive-2M-context --- # codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora ## 🚀 Progressive Context Extension to 2.0M Tokens This is a progressive LoRA adapter that extends Qwen/Qwen2.5-Coder-0.5B-Instruct to handle **2.0 MILLION token** contexts through curriculum learning. Part of the [Ellora project](https://github.com/codelion/ellora) - Recipe #4: Progressive Long Context Extension. ## 🎯 Key Features - **Final Context**: 2,000,000 tokens (62x base model) - **Training Method**: Hybrid approach with vLLM + Unsloth optimizations - **Data Generation**: vLLM for 10x+ faster task generation - **Training**: Unsloth for memory-efficient progressive training - **Single Adapter**: One LoRA handles all context lengths up to 2000K - **Use Cases**: - Entire codebase analysis - Multi-repository understanding - Large-scale code generation - Cross-file dependency analysis ## 📊 Training Progression The model was trained progressively through these stages: - Stage 1: 32K tokens (loss: 0.4882) - Stage 2: 128K tokens (loss: 0.0641) - Stage 3: 512K tokens (loss: 0.1327) - Stage 4: 2000K tokens (loss: 0.0484) ### Performance Metrics - **Final Training Loss**: 0.0484 - **Total Training Time**: 0.17 hours - **Peak Memory Usage**: 4.7 GB - **LoRA Rank**: 64 - **LoRA Alpha**: 128 ## 🔧 Usage with Unsloth ```python from unsloth import FastLanguageModel from transformers import TextStreamer # Load model with Unsloth (automatically handles 2M context!) model, tokenizer = FastLanguageModel.from_pretrained( model_name="codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora", max_seq_length=2000000, dtype=None, # Auto-detect load_in_4bit=True, ) # Enable native fast generation FastLanguageModel.for_inference(model) # Example: Analyze a large codebase prompt = """Repository Context: [Your repository content up to 2000K tokens] Question: Analyze the overall architecture and provide improvement suggestions. Answer:""" inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2000000) streamer = TextStreamer(tokenizer) outputs = model.generate( **inputs, streamer=streamer, max_new_tokens=1024, temperature=0.7, do_sample=True ) ``` ## 🔧 Usage with Transformers ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel import torch # Load base model model = AutoModelForCausalLM.from_pretrained( "Qwen/Qwen2.5-Coder-0.5B-Instruct", torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True, attn_implementation="flash_attention_2" ) # Load tokenizer tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct") # Load the progressive adapter model = PeftModel.from_pretrained(model, "codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora") # Now you can use contexts up to 2000K tokens! ``` ## 📈 Progressive Training Details This adapter was trained using a novel progressive curriculum approach with hybrid optimizations: 1. **Stage 1 (32K)**: Basic file-level understanding 2. **Stage 2 (128K)**: Multi-file repository comprehension 3. **Stage 3 (512K)**: Large repository analysis 4. **Stage 4 (2M)**: Massive codebase understanding Each stage included data from all previous stages, allowing the model to maintain and build upon its learned capabilities. ## 🛠️ Training Configuration ```yaml Progressive Stages: 32K → 128K → 512K → 2000K Final Context: 2000K tokens Base Model: Qwen/Qwen2.5-Coder-0.5B-Instruct Data Generation: vLLM (fast batch inference) Training: Unsloth (memory-efficient training) LoRA Rank: 64 LoRA Alpha: 128 Learning Rate: 0.0002 Batch Size: 1 Gradient Accumulation: 4 ``` ## 🚀 Optimizations Used ### Data Generation (vLLM) - **Batch Generation**: Process multiple prompts simultaneously - **Optimized Memory**: GPU memory utilization tuning - **Fast Inference**: 10x+ faster than sequential generation ### Training (Unsloth) - **Custom CUDA Kernels**: 2-5x training speedup - **Flash Attention 2**: Efficient attention computation - **Gradient Checkpointing**: Memory-efficient backprop - **4-bit Quantization**: Reduced memory footprint - **RSLoRA**: Rank-stabilized LoRA for better convergence ## 📊 Evaluation Tasks The model excels at: - Complete repository architectural analysis - Cross-file dependency tracing - Large-scale refactoring suggestions - Security vulnerability detection across entire codebases - Test coverage analysis - Documentation generation for entire projects ## 🏆 Achievements - Successfully extended context from 32K → 2000K tokens - Hybrid optimization: vLLM for generation + Unsloth for training - Single adapter handles all context lengths - Memory-efficient training on single H100 GPU - Real repository understanding, not just synthetic data ## 🔗 Links - **GitHub**: [Ellora Recipe #4](https://github.com/codelion/ellora) - **Dataset**: [codelion/Qwen2.5-Coder-0.5B-Instruct-progressive-2M-context](https://huggingface.co/datasets/codelion/Qwen2.5-Coder-0.5B-Instruct-progressive-2M-context) --- *This model is part of the Ellora project - standardized recipes for enhancing LLM capabilities.*