README.md · codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora at main

qwen2-5-coder-0-5b-instruct-progressive-2000k-lora / README.md

codelion

Upload folder using huggingface_hub

54c5e94 verified 5 days ago

preview code

raw

history blame contribute delete

5.56 kB

	---
	base_model: Qwen/Qwen2.5-Coder-0.5B-Instruct
	tags:
	- ellora
	- lora
	- long-context
	- repository-understanding
	- code-analysis
	- progressive-training
	- 2m-context
	- unsloth
	- vllm
	- peft
	library_name: peft
	license: apache-2.0
	language:
	- en
	pipeline_tag: text-generation
	datasets:
	- codelion/Qwen2.5-Coder-0.5B-Instruct-progressive-2M-context
	---

	# codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora

	## 🚀 Progressive Context Extension to 2.0M Tokens

	This is a progressive LoRA adapter that extends Qwen/Qwen2.5-Coder-0.5B-Instruct to handle 2.0 MILLION token contexts through curriculum learning.

	Part of the [Ellora project](https://github.com/codelion/ellora) - Recipe #4: Progressive Long Context Extension.

	## 🎯 Key Features

	- Final Context: 2,000,000 tokens (62x base model)
	- Training Method: Hybrid approach with vLLM + Unsloth optimizations
	- Data Generation: vLLM for 10x+ faster task generation
	- Training: Unsloth for memory-efficient progressive training
	- Single Adapter: One LoRA handles all context lengths up to 2000K
	- Use Cases:
	- Entire codebase analysis
	- Multi-repository understanding
	- Large-scale code generation
	- Cross-file dependency analysis

	## 📊 Training Progression

	The model was trained progressively through these stages:
	- Stage 1: 32K tokens (loss: 0.4882)
	- Stage 2: 128K tokens (loss: 0.0641)
	- Stage 3: 512K tokens (loss: 0.1327)
	- Stage 4: 2000K tokens (loss: 0.0484)

	### Performance Metrics
	- Final Training Loss: 0.0484
	- Total Training Time: 0.17 hours
	- Peak Memory Usage: 4.7 GB
	- LoRA Rank: 64
	- LoRA Alpha: 128

	## 🔧 Usage with Unsloth

	```python
	from unsloth import FastLanguageModel
	from transformers import TextStreamer

	# Load model with Unsloth (automatically handles 2M context!)
	model, tokenizer = FastLanguageModel.from_pretrained(
	model_name="codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora",
	max_seq_length=2000000,
	dtype=None, # Auto-detect
	load_in_4bit=True,
	)

	# Enable native fast generation
	FastLanguageModel.for_inference(model)

	# Example: Analyze a large codebase
	prompt = """Repository Context:
	[Your repository content up to 2000K tokens]

	Question: Analyze the overall architecture and provide improvement suggestions.

	Answer:"""

	inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2000000)
	streamer = TextStreamer(tokenizer)

	outputs = model.generate(
	**inputs,
	streamer=streamer,
	max_new_tokens=1024,
	temperature=0.7,
	do_sample=True
	)
	```

	## 🔧 Usage with Transformers

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel
	import torch

	# Load base model
	model = AutoModelForCausalLM.from_pretrained(
	"Qwen/Qwen2.5-Coder-0.5B-Instruct",
	torch_dtype=torch.bfloat16,
	device_map="auto",
	trust_remote_code=True,
	attn_implementation="flash_attention_2"
	)

	# Load tokenizer
	tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")

	# Load the progressive adapter
	model = PeftModel.from_pretrained(model, "codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora")

	# Now you can use contexts up to 2000K tokens!
	```

	## 📈 Progressive Training Details

	This adapter was trained using a novel progressive curriculum approach with hybrid optimizations:

	1. Stage 1 (32K): Basic file-level understanding
	2. Stage 2 (128K): Multi-file repository comprehension
	3. Stage 3 (512K): Large repository analysis
	4. Stage 4 (2M): Massive codebase understanding

	Each stage included data from all previous stages, allowing the model to maintain and build upon its learned capabilities.

	## 🛠️ Training Configuration

	```yaml
	Progressive Stages: 32K → 128K → 512K → 2000K
	Final Context: 2000K tokens
	Base Model: Qwen/Qwen2.5-Coder-0.5B-Instruct
	Data Generation: vLLM (fast batch inference)
	Training: Unsloth (memory-efficient training)
	LoRA Rank: 64
	LoRA Alpha: 128
	Learning Rate: 0.0002
	Batch Size: 1
	Gradient Accumulation: 4
	```

	## 🚀 Optimizations Used

	### Data Generation (vLLM)
	- Batch Generation: Process multiple prompts simultaneously
	- Optimized Memory: GPU memory utilization tuning
	- Fast Inference: 10x+ faster than sequential generation

	### Training (Unsloth)
	- Custom CUDA Kernels: 2-5x training speedup
	- Flash Attention 2: Efficient attention computation
	- Gradient Checkpointing: Memory-efficient backprop
	- 4-bit Quantization: Reduced memory footprint
	- RSLoRA: Rank-stabilized LoRA for better convergence

	## 📊 Evaluation Tasks

	The model excels at:
	- Complete repository architectural analysis
	- Cross-file dependency tracing
	- Large-scale refactoring suggestions
	- Security vulnerability detection across entire codebases
	- Test coverage analysis
	- Documentation generation for entire projects

	## 🏆 Achievements

	- Successfully extended context from 32K → 2000K tokens
	- Hybrid optimization: vLLM for generation + Unsloth for training
	- Single adapter handles all context lengths
	- Memory-efficient training on single H100 GPU
	- Real repository understanding, not just synthetic data

	## 🔗 Links

	- GitHub: [Ellora Recipe #4](https://github.com/codelion/ellora)
	- Dataset: [codelion/Qwen2.5-Coder-0.5B-Instruct-progressive-2M-context](https://huggingface.co/datasets/codelion/Qwen2.5-Coder-0.5B-Instruct-progressive-2M-context)

	---

	This model is part of the Ellora project - standardized recipes for enhancing LLM capabilities.