🦙 TinyLlama Instruction-Tuned Models: LoRA, AdaLoRA, QLoRA

This repo hosts a set of TinyLlama 1.1B models fine-tuned using various parameter-efficient methods:

✅ LoRA (Low-Rank Adaptation)
✅ AdaLoRA (Adaptive Low-Rank Adaptation with rank scheduling)
✅ QLoRA (Quantized LoRA for low-memory environments)

These models are fine-tuned on a custom instruction-response dataset for general-purpose instruction-following.

📦 Model Variants

Name	Folder Name	Method	Notes
LoRA	`lora-tinyllama-final`	LoRA	Standard fine-tuned model
AdaLoRA	`adalora-tinyllama-final`	AdaLoRA	Rank-adaptive LoRA
QLoRA	`qlora-tinyllama-final`	QLoRA	Quantized LoRA (int4)

🧠 Base Model

Base: TinyLlama/TinyLlama-1.1B-Chat-v1.0
Tokenizer: SentencePiece with eos_token padding

🚀 Inference Example

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base_model = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
lora_dir = "lora-tinyllama-final"  # or use "adalora-tinyllama-final", "qlora-tinyllama-final"

# Tokenizer
tokenizer = AutoTokenizer.from_pretrained(lora_dir)
tokenizer.pad_token = tokenizer.eos_token

# Load model + adapter
base = AutoModelForCausalLM.from_pretrained(base_model, torch_dtype=torch.float16, device_map="auto")
model = PeftModel.from_pretrained(base, lora_dir)
model = model.merge_and_unload()
model.eval()

def ask(prompt):
    prompt = f"### Instruction:\n{prompt}\n\n### Response:\n"
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    with torch.no_grad():
        output = model.generate(**inputs, max_new_tokens=150, temperature=0.7, top_p=0.9, do_sample=True)
    return tokenizer.decode(output[0], skip_special_tokens=True).split("### Response:")[-1].strip()

print(ask("What is your name?"))

sujal7102003
/

q-a

You need to agree to share your contact information to access this model

🦙 TinyLlama Instruction-Tuned Models: LoRA, AdaLoRA, QLoRA

📦 Model Variants

🧠 Base Model

🚀 Inference Example