---
base_model: Qwen/Qwen2.5-Coder-0.5B-Instruct
tags:
- ellora
- lora
- long-context
- repository-understanding
- code-analysis
- progressive-training
- 2m-context
- unsloth
- vllm
- peft
library_name: peft
license: apache-2.0
language:
- en
pipeline_tag: text-generation
datasets:
- codelion/Qwen2.5-Coder-0.5B-Instruct-progressive-2M-context
---

# codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora

## 🚀 Progressive Context Extension to 2.0M Tokens

This is a progressive LoRA adapter that extends Qwen/Qwen2.5-Coder-0.5B-Instruct to handle **2.0 MILLION token** contexts through curriculum learning.

Part of the [Ellora project](https://github.com/codelion/ellora) - Recipe #4: Progressive Long Context Extension.

## 🎯 Key Features

- **Final Context**: 2,000,000 tokens (62x base model)
- **Training Method**: Hybrid approach with vLLM + Unsloth optimizations
- **Data Generation**: vLLM for 10x+ faster task generation
- **Training**: Unsloth for memory-efficient progressive training
- **Single Adapter**: One LoRA handles all context lengths up to 2000K
- **Use Cases**: 
  - Entire codebase analysis
  - Multi-repository understanding
  - Large-scale code generation
  - Cross-file dependency analysis

## 📊 Training Progression

The model was trained progressively through these stages:
   - Stage 1: 32K tokens (loss: 0.4882)
   - Stage 2: 128K tokens (loss: 0.0641)
   - Stage 3: 512K tokens (loss: 0.1327)
   - Stage 4: 2000K tokens (loss: 0.0484)

### Performance Metrics
- **Final Training Loss**: 0.0484
- **Total Training Time**: 0.17 hours
- **Peak Memory Usage**: 4.7 GB
- **LoRA Rank**: 64
- **LoRA Alpha**: 128

## 🔧 Usage with Unsloth

```python
from unsloth import FastLanguageModel
from transformers import TextStreamer

# Load model with Unsloth (automatically handles 2M context!)
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora",
    max_seq_length=2000000,
    dtype=None,  # Auto-detect
    load_in_4bit=True,
)

# Enable native fast generation
FastLanguageModel.for_inference(model)

# Example: Analyze a large codebase
prompt = """Repository Context:
[Your repository content up to 2000K tokens]

Question: Analyze the overall architecture and provide improvement suggestions.

Answer:"""

inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2000000)
streamer = TextStreamer(tokenizer)

outputs = model.generate(
    **inputs,
    streamer=streamer,
    max_new_tokens=1024,
    temperature=0.7,
    do_sample=True
)
```

## 🔧 Usage with Transformers

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-Coder-0.5B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
    attn_implementation="flash_attention_2"
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")

# Load the progressive adapter
model = PeftModel.from_pretrained(model, "codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora")

# Now you can use contexts up to 2000K tokens!
```

## 📈 Progressive Training Details

This adapter was trained using a novel progressive curriculum approach with hybrid optimizations:

1. **Stage 1 (32K)**: Basic file-level understanding
2. **Stage 2 (128K)**: Multi-file repository comprehension  
3. **Stage 3 (512K)**: Large repository analysis
4. **Stage 4 (2M)**: Massive codebase understanding

Each stage included data from all previous stages, allowing the model to maintain and build upon its learned capabilities.

## 🛠️ Training Configuration

```yaml
Progressive Stages: 32K → 128K → 512K → 2000K
Final Context: 2000K tokens
Base Model: Qwen/Qwen2.5-Coder-0.5B-Instruct
Data Generation: vLLM (fast batch inference)
Training: Unsloth (memory-efficient training)
LoRA Rank: 64
LoRA Alpha: 128
Learning Rate: 0.0002
Batch Size: 1
Gradient Accumulation: 4
```

## 🚀 Optimizations Used

### Data Generation (vLLM)
- **Batch Generation**: Process multiple prompts simultaneously
- **Optimized Memory**: GPU memory utilization tuning
- **Fast Inference**: 10x+ faster than sequential generation

### Training (Unsloth)
- **Custom CUDA Kernels**: 2-5x training speedup
- **Flash Attention 2**: Efficient attention computation
- **Gradient Checkpointing**: Memory-efficient backprop
- **4-bit Quantization**: Reduced memory footprint
- **RSLoRA**: Rank-stabilized LoRA for better convergence

## 📊 Evaluation Tasks

The model excels at:
- Complete repository architectural analysis
- Cross-file dependency tracing
- Large-scale refactoring suggestions
- Security vulnerability detection across entire codebases
- Test coverage analysis
- Documentation generation for entire projects

## 🏆 Achievements

- Successfully extended context from 32K → 2000K tokens
- Hybrid optimization: vLLM for generation + Unsloth for training
- Single adapter handles all context lengths
- Memory-efficient training on single H100 GPU
- Real repository understanding, not just synthetic data

## 🔗 Links

- **GitHub**: [Ellora Recipe #4](https://github.com/codelion/ellora)
- **Dataset**: [codelion/Qwen2.5-Coder-0.5B-Instruct-progressive-2M-context](https://huggingface.co/datasets/codelion/Qwen2.5-Coder-0.5B-Instruct-progressive-2M-context)

---

*This model is part of the Ellora project - standardized recipes for enhancing LLM capabilities.*