codelion's picture
Upload folder using huggingface_hub
54c5e94 verified
---
base_model: Qwen/Qwen2.5-Coder-0.5B-Instruct
tags:
- ellora
- lora
- long-context
- repository-understanding
- code-analysis
- progressive-training
- 2m-context
- unsloth
- vllm
- peft
library_name: peft
license: apache-2.0
language:
- en
pipeline_tag: text-generation
datasets:
- codelion/Qwen2.5-Coder-0.5B-Instruct-progressive-2M-context
---
# codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora
## πŸš€ Progressive Context Extension to 2.0M Tokens
This is a progressive LoRA adapter that extends Qwen/Qwen2.5-Coder-0.5B-Instruct to handle **2.0 MILLION token** contexts through curriculum learning.
Part of the [Ellora project](https://github.com/codelion/ellora) - Recipe #4: Progressive Long Context Extension.
## 🎯 Key Features
- **Final Context**: 2,000,000 tokens (62x base model)
- **Training Method**: Hybrid approach with vLLM + Unsloth optimizations
- **Data Generation**: vLLM for 10x+ faster task generation
- **Training**: Unsloth for memory-efficient progressive training
- **Single Adapter**: One LoRA handles all context lengths up to 2000K
- **Use Cases**:
- Entire codebase analysis
- Multi-repository understanding
- Large-scale code generation
- Cross-file dependency analysis
## πŸ“Š Training Progression
The model was trained progressively through these stages:
- Stage 1: 32K tokens (loss: 0.4882)
- Stage 2: 128K tokens (loss: 0.0641)
- Stage 3: 512K tokens (loss: 0.1327)
- Stage 4: 2000K tokens (loss: 0.0484)
### Performance Metrics
- **Final Training Loss**: 0.0484
- **Total Training Time**: 0.17 hours
- **Peak Memory Usage**: 4.7 GB
- **LoRA Rank**: 64
- **LoRA Alpha**: 128
## πŸ”§ Usage with Unsloth
```python
from unsloth import FastLanguageModel
from transformers import TextStreamer
# Load model with Unsloth (automatically handles 2M context!)
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora",
max_seq_length=2000000,
dtype=None, # Auto-detect
load_in_4bit=True,
)
# Enable native fast generation
FastLanguageModel.for_inference(model)
# Example: Analyze a large codebase
prompt = """Repository Context:
[Your repository content up to 2000K tokens]
Question: Analyze the overall architecture and provide improvement suggestions.
Answer:"""
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2000000)
streamer = TextStreamer(tokenizer)
outputs = model.generate(
**inputs,
streamer=streamer,
max_new_tokens=1024,
temperature=0.7,
do_sample=True
)
```
## πŸ”§ Usage with Transformers
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
# Load base model
model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-Coder-0.5B-Instruct",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
attn_implementation="flash_attention_2"
)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")
# Load the progressive adapter
model = PeftModel.from_pretrained(model, "codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora")
# Now you can use contexts up to 2000K tokens!
```
## πŸ“ˆ Progressive Training Details
This adapter was trained using a novel progressive curriculum approach with hybrid optimizations:
1. **Stage 1 (32K)**: Basic file-level understanding
2. **Stage 2 (128K)**: Multi-file repository comprehension
3. **Stage 3 (512K)**: Large repository analysis
4. **Stage 4 (2M)**: Massive codebase understanding
Each stage included data from all previous stages, allowing the model to maintain and build upon its learned capabilities.
## πŸ› οΈ Training Configuration
```yaml
Progressive Stages: 32K β†’ 128K β†’ 512K β†’ 2000K
Final Context: 2000K tokens
Base Model: Qwen/Qwen2.5-Coder-0.5B-Instruct
Data Generation: vLLM (fast batch inference)
Training: Unsloth (memory-efficient training)
LoRA Rank: 64
LoRA Alpha: 128
Learning Rate: 0.0002
Batch Size: 1
Gradient Accumulation: 4
```
## πŸš€ Optimizations Used
### Data Generation (vLLM)
- **Batch Generation**: Process multiple prompts simultaneously
- **Optimized Memory**: GPU memory utilization tuning
- **Fast Inference**: 10x+ faster than sequential generation
### Training (Unsloth)
- **Custom CUDA Kernels**: 2-5x training speedup
- **Flash Attention 2**: Efficient attention computation
- **Gradient Checkpointing**: Memory-efficient backprop
- **4-bit Quantization**: Reduced memory footprint
- **RSLoRA**: Rank-stabilized LoRA for better convergence
## πŸ“Š Evaluation Tasks
The model excels at:
- Complete repository architectural analysis
- Cross-file dependency tracing
- Large-scale refactoring suggestions
- Security vulnerability detection across entire codebases
- Test coverage analysis
- Documentation generation for entire projects
## πŸ† Achievements
- Successfully extended context from 32K β†’ 2000K tokens
- Hybrid optimization: vLLM for generation + Unsloth for training
- Single adapter handles all context lengths
- Memory-efficient training on single H100 GPU
- Real repository understanding, not just synthetic data
## πŸ”— Links
- **GitHub**: [Ellora Recipe #4](https://github.com/codelion/ellora)
- **Dataset**: [codelion/Qwen2.5-Coder-0.5B-Instruct-progressive-2M-context](https://huggingface.co/datasets/codelion/Qwen2.5-Coder-0.5B-Instruct-progressive-2M-context)
---
*This model is part of the Ellora project - standardized recipes for enhancing LLM capabilities.*