|
--- |
|
base_model: Qwen/Qwen2.5-Coder-0.5B-Instruct |
|
tags: |
|
- ellora |
|
- lora |
|
- long-context |
|
- repository-understanding |
|
- code-analysis |
|
- progressive-training |
|
- 2m-context |
|
- unsloth |
|
- vllm |
|
- peft |
|
library_name: peft |
|
license: apache-2.0 |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
datasets: |
|
- codelion/Qwen2.5-Coder-0.5B-Instruct-progressive-2M-context |
|
--- |
|
|
|
# codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora |
|
|
|
## π Progressive Context Extension to 2.0M Tokens |
|
|
|
This is a progressive LoRA adapter that extends Qwen/Qwen2.5-Coder-0.5B-Instruct to handle **2.0 MILLION token** contexts through curriculum learning. |
|
|
|
Part of the [Ellora project](https://github.com/codelion/ellora) - Recipe #4: Progressive Long Context Extension. |
|
|
|
## π― Key Features |
|
|
|
- **Final Context**: 2,000,000 tokens (62x base model) |
|
- **Training Method**: Hybrid approach with vLLM + Unsloth optimizations |
|
- **Data Generation**: vLLM for 10x+ faster task generation |
|
- **Training**: Unsloth for memory-efficient progressive training |
|
- **Single Adapter**: One LoRA handles all context lengths up to 2000K |
|
- **Use Cases**: |
|
- Entire codebase analysis |
|
- Multi-repository understanding |
|
- Large-scale code generation |
|
- Cross-file dependency analysis |
|
|
|
## π Training Progression |
|
|
|
The model was trained progressively through these stages: |
|
- Stage 1: 32K tokens (loss: 0.4882) |
|
- Stage 2: 128K tokens (loss: 0.0641) |
|
- Stage 3: 512K tokens (loss: 0.1327) |
|
- Stage 4: 2000K tokens (loss: 0.0484) |
|
|
|
### Performance Metrics |
|
- **Final Training Loss**: 0.0484 |
|
- **Total Training Time**: 0.17 hours |
|
- **Peak Memory Usage**: 4.7 GB |
|
- **LoRA Rank**: 64 |
|
- **LoRA Alpha**: 128 |
|
|
|
## π§ Usage with Unsloth |
|
|
|
```python |
|
from unsloth import FastLanguageModel |
|
from transformers import TextStreamer |
|
|
|
# Load model with Unsloth (automatically handles 2M context!) |
|
model, tokenizer = FastLanguageModel.from_pretrained( |
|
model_name="codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora", |
|
max_seq_length=2000000, |
|
dtype=None, # Auto-detect |
|
load_in_4bit=True, |
|
) |
|
|
|
# Enable native fast generation |
|
FastLanguageModel.for_inference(model) |
|
|
|
# Example: Analyze a large codebase |
|
prompt = """Repository Context: |
|
[Your repository content up to 2000K tokens] |
|
|
|
Question: Analyze the overall architecture and provide improvement suggestions. |
|
|
|
Answer:""" |
|
|
|
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2000000) |
|
streamer = TextStreamer(tokenizer) |
|
|
|
outputs = model.generate( |
|
**inputs, |
|
streamer=streamer, |
|
max_new_tokens=1024, |
|
temperature=0.7, |
|
do_sample=True |
|
) |
|
``` |
|
|
|
## π§ Usage with Transformers |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
from peft import PeftModel |
|
import torch |
|
|
|
# Load base model |
|
model = AutoModelForCausalLM.from_pretrained( |
|
"Qwen/Qwen2.5-Coder-0.5B-Instruct", |
|
torch_dtype=torch.bfloat16, |
|
device_map="auto", |
|
trust_remote_code=True, |
|
attn_implementation="flash_attention_2" |
|
) |
|
|
|
# Load tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct") |
|
|
|
# Load the progressive adapter |
|
model = PeftModel.from_pretrained(model, "codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora") |
|
|
|
# Now you can use contexts up to 2000K tokens! |
|
``` |
|
|
|
## π Progressive Training Details |
|
|
|
This adapter was trained using a novel progressive curriculum approach with hybrid optimizations: |
|
|
|
1. **Stage 1 (32K)**: Basic file-level understanding |
|
2. **Stage 2 (128K)**: Multi-file repository comprehension |
|
3. **Stage 3 (512K)**: Large repository analysis |
|
4. **Stage 4 (2M)**: Massive codebase understanding |
|
|
|
Each stage included data from all previous stages, allowing the model to maintain and build upon its learned capabilities. |
|
|
|
## π οΈ Training Configuration |
|
|
|
```yaml |
|
Progressive Stages: 32K β 128K β 512K β 2000K |
|
Final Context: 2000K tokens |
|
Base Model: Qwen/Qwen2.5-Coder-0.5B-Instruct |
|
Data Generation: vLLM (fast batch inference) |
|
Training: Unsloth (memory-efficient training) |
|
LoRA Rank: 64 |
|
LoRA Alpha: 128 |
|
Learning Rate: 0.0002 |
|
Batch Size: 1 |
|
Gradient Accumulation: 4 |
|
``` |
|
|
|
## π Optimizations Used |
|
|
|
### Data Generation (vLLM) |
|
- **Batch Generation**: Process multiple prompts simultaneously |
|
- **Optimized Memory**: GPU memory utilization tuning |
|
- **Fast Inference**: 10x+ faster than sequential generation |
|
|
|
### Training (Unsloth) |
|
- **Custom CUDA Kernels**: 2-5x training speedup |
|
- **Flash Attention 2**: Efficient attention computation |
|
- **Gradient Checkpointing**: Memory-efficient backprop |
|
- **4-bit Quantization**: Reduced memory footprint |
|
- **RSLoRA**: Rank-stabilized LoRA for better convergence |
|
|
|
## π Evaluation Tasks |
|
|
|
The model excels at: |
|
- Complete repository architectural analysis |
|
- Cross-file dependency tracing |
|
- Large-scale refactoring suggestions |
|
- Security vulnerability detection across entire codebases |
|
- Test coverage analysis |
|
- Documentation generation for entire projects |
|
|
|
## π Achievements |
|
|
|
- Successfully extended context from 32K β 2000K tokens |
|
- Hybrid optimization: vLLM for generation + Unsloth for training |
|
- Single adapter handles all context lengths |
|
- Memory-efficient training on single H100 GPU |
|
- Real repository understanding, not just synthetic data |
|
|
|
## π Links |
|
|
|
- **GitHub**: [Ellora Recipe #4](https://github.com/codelion/ellora) |
|
- **Dataset**: [codelion/Qwen2.5-Coder-0.5B-Instruct-progressive-2M-context](https://huggingface.co/datasets/codelion/Qwen2.5-Coder-0.5B-Instruct-progressive-2M-context) |
|
|
|
--- |
|
|
|
*This model is part of the Ellora project - standardized recipes for enhancing LLM capabilities.* |
|
|