File size: 5,557 Bytes
54c5e94
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
---
base_model: Qwen/Qwen2.5-Coder-0.5B-Instruct
tags:
- ellora
- lora
- long-context
- repository-understanding
- code-analysis
- progressive-training
- 2m-context
- unsloth
- vllm
- peft
library_name: peft
license: apache-2.0
language:
- en
pipeline_tag: text-generation
datasets:
- codelion/Qwen2.5-Coder-0.5B-Instruct-progressive-2M-context
---

# codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora

## πŸš€ Progressive Context Extension to 2.0M Tokens

This is a progressive LoRA adapter that extends Qwen/Qwen2.5-Coder-0.5B-Instruct to handle **2.0 MILLION token** contexts through curriculum learning.

Part of the [Ellora project](https://github.com/codelion/ellora) - Recipe #4: Progressive Long Context Extension.

## 🎯 Key Features

- **Final Context**: 2,000,000 tokens (62x base model)
- **Training Method**: Hybrid approach with vLLM + Unsloth optimizations
- **Data Generation**: vLLM for 10x+ faster task generation
- **Training**: Unsloth for memory-efficient progressive training
- **Single Adapter**: One LoRA handles all context lengths up to 2000K
- **Use Cases**: 
  - Entire codebase analysis
  - Multi-repository understanding
  - Large-scale code generation
  - Cross-file dependency analysis

## πŸ“Š Training Progression

The model was trained progressively through these stages:
   - Stage 1: 32K tokens (loss: 0.4882)
   - Stage 2: 128K tokens (loss: 0.0641)
   - Stage 3: 512K tokens (loss: 0.1327)
   - Stage 4: 2000K tokens (loss: 0.0484)

### Performance Metrics
- **Final Training Loss**: 0.0484
- **Total Training Time**: 0.17 hours
- **Peak Memory Usage**: 4.7 GB
- **LoRA Rank**: 64
- **LoRA Alpha**: 128

## πŸ”§ Usage with Unsloth

```python
from unsloth import FastLanguageModel
from transformers import TextStreamer

# Load model with Unsloth (automatically handles 2M context!)
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora",
    max_seq_length=2000000,
    dtype=None,  # Auto-detect
    load_in_4bit=True,
)

# Enable native fast generation
FastLanguageModel.for_inference(model)

# Example: Analyze a large codebase
prompt = """Repository Context:
[Your repository content up to 2000K tokens]

Question: Analyze the overall architecture and provide improvement suggestions.

Answer:"""

inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2000000)
streamer = TextStreamer(tokenizer)

outputs = model.generate(
    **inputs,
    streamer=streamer,
    max_new_tokens=1024,
    temperature=0.7,
    do_sample=True
)
```

## πŸ”§ Usage with Transformers

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-Coder-0.5B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
    attn_implementation="flash_attention_2"
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")

# Load the progressive adapter
model = PeftModel.from_pretrained(model, "codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora")

# Now you can use contexts up to 2000K tokens!
```

## πŸ“ˆ Progressive Training Details

This adapter was trained using a novel progressive curriculum approach with hybrid optimizations:

1. **Stage 1 (32K)**: Basic file-level understanding
2. **Stage 2 (128K)**: Multi-file repository comprehension  
3. **Stage 3 (512K)**: Large repository analysis
4. **Stage 4 (2M)**: Massive codebase understanding

Each stage included data from all previous stages, allowing the model to maintain and build upon its learned capabilities.

## πŸ› οΈ Training Configuration

```yaml
Progressive Stages: 32K β†’ 128K β†’ 512K β†’ 2000K
Final Context: 2000K tokens
Base Model: Qwen/Qwen2.5-Coder-0.5B-Instruct
Data Generation: vLLM (fast batch inference)
Training: Unsloth (memory-efficient training)
LoRA Rank: 64
LoRA Alpha: 128
Learning Rate: 0.0002
Batch Size: 1
Gradient Accumulation: 4
```

## πŸš€ Optimizations Used

### Data Generation (vLLM)
- **Batch Generation**: Process multiple prompts simultaneously
- **Optimized Memory**: GPU memory utilization tuning
- **Fast Inference**: 10x+ faster than sequential generation

### Training (Unsloth)
- **Custom CUDA Kernels**: 2-5x training speedup
- **Flash Attention 2**: Efficient attention computation
- **Gradient Checkpointing**: Memory-efficient backprop
- **4-bit Quantization**: Reduced memory footprint
- **RSLoRA**: Rank-stabilized LoRA for better convergence

## πŸ“Š Evaluation Tasks

The model excels at:
- Complete repository architectural analysis
- Cross-file dependency tracing
- Large-scale refactoring suggestions
- Security vulnerability detection across entire codebases
- Test coverage analysis
- Documentation generation for entire projects

## πŸ† Achievements

- Successfully extended context from 32K β†’ 2000K tokens
- Hybrid optimization: vLLM for generation + Unsloth for training
- Single adapter handles all context lengths
- Memory-efficient training on single H100 GPU
- Real repository understanding, not just synthetic data

## πŸ”— Links

- **GitHub**: [Ellora Recipe #4](https://github.com/codelion/ellora)
- **Dataset**: [codelion/Qwen2.5-Coder-0.5B-Instruct-progressive-2M-context](https://huggingface.co/datasets/codelion/Qwen2.5-Coder-0.5B-Instruct-progressive-2M-context)

---

*This model is part of the Ellora project - standardized recipes for enhancing LLM capabilities.*