File size: 5,557 Bytes
54c5e94 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 |
---
base_model: Qwen/Qwen2.5-Coder-0.5B-Instruct
tags:
- ellora
- lora
- long-context
- repository-understanding
- code-analysis
- progressive-training
- 2m-context
- unsloth
- vllm
- peft
library_name: peft
license: apache-2.0
language:
- en
pipeline_tag: text-generation
datasets:
- codelion/Qwen2.5-Coder-0.5B-Instruct-progressive-2M-context
---
# codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora
## π Progressive Context Extension to 2.0M Tokens
This is a progressive LoRA adapter that extends Qwen/Qwen2.5-Coder-0.5B-Instruct to handle **2.0 MILLION token** contexts through curriculum learning.
Part of the [Ellora project](https://github.com/codelion/ellora) - Recipe #4: Progressive Long Context Extension.
## π― Key Features
- **Final Context**: 2,000,000 tokens (62x base model)
- **Training Method**: Hybrid approach with vLLM + Unsloth optimizations
- **Data Generation**: vLLM for 10x+ faster task generation
- **Training**: Unsloth for memory-efficient progressive training
- **Single Adapter**: One LoRA handles all context lengths up to 2000K
- **Use Cases**:
- Entire codebase analysis
- Multi-repository understanding
- Large-scale code generation
- Cross-file dependency analysis
## π Training Progression
The model was trained progressively through these stages:
- Stage 1: 32K tokens (loss: 0.4882)
- Stage 2: 128K tokens (loss: 0.0641)
- Stage 3: 512K tokens (loss: 0.1327)
- Stage 4: 2000K tokens (loss: 0.0484)
### Performance Metrics
- **Final Training Loss**: 0.0484
- **Total Training Time**: 0.17 hours
- **Peak Memory Usage**: 4.7 GB
- **LoRA Rank**: 64
- **LoRA Alpha**: 128
## π§ Usage with Unsloth
```python
from unsloth import FastLanguageModel
from transformers import TextStreamer
# Load model with Unsloth (automatically handles 2M context!)
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora",
max_seq_length=2000000,
dtype=None, # Auto-detect
load_in_4bit=True,
)
# Enable native fast generation
FastLanguageModel.for_inference(model)
# Example: Analyze a large codebase
prompt = """Repository Context:
[Your repository content up to 2000K tokens]
Question: Analyze the overall architecture and provide improvement suggestions.
Answer:"""
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2000000)
streamer = TextStreamer(tokenizer)
outputs = model.generate(
**inputs,
streamer=streamer,
max_new_tokens=1024,
temperature=0.7,
do_sample=True
)
```
## π§ Usage with Transformers
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
# Load base model
model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-Coder-0.5B-Instruct",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
attn_implementation="flash_attention_2"
)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")
# Load the progressive adapter
model = PeftModel.from_pretrained(model, "codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora")
# Now you can use contexts up to 2000K tokens!
```
## π Progressive Training Details
This adapter was trained using a novel progressive curriculum approach with hybrid optimizations:
1. **Stage 1 (32K)**: Basic file-level understanding
2. **Stage 2 (128K)**: Multi-file repository comprehension
3. **Stage 3 (512K)**: Large repository analysis
4. **Stage 4 (2M)**: Massive codebase understanding
Each stage included data from all previous stages, allowing the model to maintain and build upon its learned capabilities.
## π οΈ Training Configuration
```yaml
Progressive Stages: 32K β 128K β 512K β 2000K
Final Context: 2000K tokens
Base Model: Qwen/Qwen2.5-Coder-0.5B-Instruct
Data Generation: vLLM (fast batch inference)
Training: Unsloth (memory-efficient training)
LoRA Rank: 64
LoRA Alpha: 128
Learning Rate: 0.0002
Batch Size: 1
Gradient Accumulation: 4
```
## π Optimizations Used
### Data Generation (vLLM)
- **Batch Generation**: Process multiple prompts simultaneously
- **Optimized Memory**: GPU memory utilization tuning
- **Fast Inference**: 10x+ faster than sequential generation
### Training (Unsloth)
- **Custom CUDA Kernels**: 2-5x training speedup
- **Flash Attention 2**: Efficient attention computation
- **Gradient Checkpointing**: Memory-efficient backprop
- **4-bit Quantization**: Reduced memory footprint
- **RSLoRA**: Rank-stabilized LoRA for better convergence
## π Evaluation Tasks
The model excels at:
- Complete repository architectural analysis
- Cross-file dependency tracing
- Large-scale refactoring suggestions
- Security vulnerability detection across entire codebases
- Test coverage analysis
- Documentation generation for entire projects
## π Achievements
- Successfully extended context from 32K β 2000K tokens
- Hybrid optimization: vLLM for generation + Unsloth for training
- Single adapter handles all context lengths
- Memory-efficient training on single H100 GPU
- Real repository understanding, not just synthetic data
## π Links
- **GitHub**: [Ellora Recipe #4](https://github.com/codelion/ellora)
- **Dataset**: [codelion/Qwen2.5-Coder-0.5B-Instruct-progressive-2M-context](https://huggingface.co/datasets/codelion/Qwen2.5-Coder-0.5B-Instruct-progressive-2M-context)
---
*This model is part of the Ellora project - standardized recipes for enhancing LLM capabilities.*
|