Ellora
Collection
Ellora: Enhancing LLMs with LoRA - Standardized Recipes for Capability Enhancement
β’
8 items
β’
Updated
β’
1
This is a progressive LoRA adapter that extends Qwen/Qwen2.5-Coder-0.5B-Instruct to handle 2.0 MILLION token contexts through curriculum learning.
Part of the Ellora project - Recipe #4: Progressive Long Context Extension.
The model was trained progressively through these stages:
from unsloth import FastLanguageModel
from transformers import TextStreamer
# Load model with Unsloth (automatically handles 2M context!)
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora",
max_seq_length=2000000,
dtype=None, # Auto-detect
load_in_4bit=True,
)
# Enable native fast generation
FastLanguageModel.for_inference(model)
# Example: Analyze a large codebase
prompt = """Repository Context:
[Your repository content up to 2000K tokens]
Question: Analyze the overall architecture and provide improvement suggestions.
Answer:"""
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2000000)
streamer = TextStreamer(tokenizer)
outputs = model.generate(
**inputs,
streamer=streamer,
max_new_tokens=1024,
temperature=0.7,
do_sample=True
)
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
# Load base model
model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-Coder-0.5B-Instruct",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
attn_implementation="flash_attention_2"
)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-0.5B-Instruct")
# Load the progressive adapter
model = PeftModel.from_pretrained(model, "codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora")
# Now you can use contexts up to 2000K tokens!
This adapter was trained using a novel progressive curriculum approach with hybrid optimizations:
Each stage included data from all previous stages, allowing the model to maintain and build upon its learned capabilities.
Progressive Stages: 32K β 128K β 512K β 2000K
Final Context: 2000K tokens
Base Model: Qwen/Qwen2.5-Coder-0.5B-Instruct
Data Generation: vLLM (fast batch inference)
Training: Unsloth (memory-efficient training)
LoRA Rank: 64
LoRA Alpha: 128
Learning Rate: 0.0002
Batch Size: 1
Gradient Accumulation: 4
The model excels at:
This model is part of the Ellora project - standardized recipes for enhancing LLM capabilities.
Base model
Qwen/Qwen2.5-0.5B