metadata

license: apache-2.0
inference: true
tags:
  - llama3
  - unsloth
  - qlora
  - peft
  - instruction-tuned
  - text-generation
  - adapter
  - alpaca
  - 4bit
  - merged
library_name: transformers
datasets:
  - yahma/alpaca-cleaned
model-index:
  - name: unsloth-llama3-alpaca-lora
    results:
      - task:
          type: text-generation
          name: Text Generation
        metrics:
          - name: Hallucination Detection (QLoRA-specific)
            type: hallucination-detection
            value: mitigated
          - name: Instruction Match Score
            type: exact-match
            value: 2.3 / 3
            comment: ≥4/6 keyword coverage in 2 out of 3 instructions
          - name: Output Quality (manual)
            type: qualitative
            value: pass
            comment: >-
              Human review confirms adherence to prompt intent in 2/3
              completions

unsloth-llama3-alpaca-lora

A 4-bit QLoRA adapter for unsloth/llama-3-8b-bnb-4bit, fine-tuned on the Stanford Alpaca dataset (52K instructions). Lightweight, efficient, and open. Built with Unsloth, HF PEFT, and 🤗 Datasets for low-resource, instruction-following tasks. Adapter weights only. Reproducible and ready to deploy.

👉 Full training, evaluation, and deployment code available at GitHub: Cre4T3Tiv3/unsloth-llama3-alpaca-lora

How to Use

Merge Adapter into Base Model

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

BASE_MODEL = "unsloth/llama-3-8b-bnb-4bit"
ADAPTER = "Cre4T3Tiv3/unsloth-llama3-alpaca-lora"

# Load base model and adapter
base_model = AutoModelForCausalLM.from_pretrained(BASE_MODEL, device_map="auto", load_in_4bit=True)
model = PeftModel.from_pretrained(base_model, ADAPTER)
model = model.merge_and_unload()

tokenizer = AutoTokenizer.from_pretrained(ADAPTER)

# Run inference
prompt = """### Instruction:
What is QLoRA?

### Response:"""
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

LoRA Training Configuration

Parameter	Value
Base Model	`unsloth/llama-3-8b-bnb-4bit`
r	16
alpha	16
dropout	0.05
Bits	4-bit (bnb)
Framework	Unsloth + HuggingFace PEFT
Adapter Format	LoRA (merged post-training)

Dataset

yahma/alpaca-cleaned
Augmented with 30+ grounded examples explaining QLoRA to mitigate hallucinations

Hardware Used

A100 (40GB VRAM)

Evaluation

This adapter was evaluated using a custom script to detect:

QLoRA hallucination (e.g. “Quantized Linear Regression”) ✅ Mitigated
Keyword coverage across instruction outputs (≥4/6 match)
Response quality on instruction-following examples

See eval_adapter.py in the GitHub repo for reproducibility.

Limitations

May hallucinate
Not intended for factual QA or decision-critical workflows
Output subject to 4-bit quantization limitations

Intended Use

This adapter is designed for:

Local inference using QLoRA-efficient weights
Instruction-following in interactive, UI, or CLI agents
Experimentation with LoRA/PEFT pipelines
Educational demos of efficient fine-tuning

Demo

🖥 Try the adapter in a browser:
👉 HF Space → unsloth-llama3-alpaca-demo

Built With

Maintainer

@Cre4T3Tiv3
Built with ❤️ by ByteStack Labs

Citation

If you use this adapter or its training methodology, please consider citing:

@software{unsloth-llama3-alpaca-lora,
  author = {Jesse Moses, Cre4T3Tiv3},
  title = {Unsloth LoRA Adapter for LLaMA 3 (8B)},
  year = {2025},
  url = {https://huggingface.co/Cre4T3Tiv3/unsloth-llama3-alpaca-lora},
}

License

Apache 2.0