π¦ TinyLlama Instruction-Tuned Models: LoRA, AdaLoRA, QLoRA
This repo hosts a set of TinyLlama 1.1B models fine-tuned using various parameter-efficient methods:
- β LoRA (Low-Rank Adaptation)
- β AdaLoRA (Adaptive Low-Rank Adaptation with rank scheduling)
- β QLoRA (Quantized LoRA for low-memory environments)
These models are fine-tuned on a custom instruction-response dataset for general-purpose instruction-following.
π¦ Model Variants
Name | Folder Name | Method | Notes |
---|---|---|---|
LoRA | lora-tinyllama-final |
LoRA | Standard fine-tuned model |
AdaLoRA | adalora-tinyllama-final |
AdaLoRA | Rank-adaptive LoRA |
QLoRA | qlora-tinyllama-final |
QLoRA | Quantized LoRA (int4) |
π§ Base Model
- Base: TinyLlama/TinyLlama-1.1B-Chat-v1.0
- Tokenizer: SentencePiece with
eos_token
padding
π Inference Example
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base_model = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
lora_dir = "lora-tinyllama-final" # or use "adalora-tinyllama-final", "qlora-tinyllama-final"
# Tokenizer
tokenizer = AutoTokenizer.from_pretrained(lora_dir)
tokenizer.pad_token = tokenizer.eos_token
# Load model + adapter
base = AutoModelForCausalLM.from_pretrained(base_model, torch_dtype=torch.float16, device_map="auto")
model = PeftModel.from_pretrained(base, lora_dir)
model = model.merge_and_unload()
model.eval()
def ask(prompt):
prompt = f"### Instruction:\n{prompt}\n\n### Response:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(**inputs, max_new_tokens=150, temperature=0.7, top_p=0.9, do_sample=True)
return tokenizer.decode(output[0], skip_special_tokens=True).split("### Response:")[-1].strip()
print(ask("What is your name?"))
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support