Model Card for Amirhossein75/LLM-Decoder-Tuning-Text-Classification
One‑line summary: Decoder‑only LLMs (e.g., Llama‑3.2‑1B) fine‑tuned for multi‑label text classification using LoRA adapters, with optional 4‑bit QLoRA quantization for memory‑efficient training and inference. A clean CLI and YAML config make it easy to reproduce results and swap backbones.
This model card accompanies the repository LLM‑Decoder‑Tuning‑Text‑Classification and documents a practical recipe for using decoder‑only LLMs as strong multi‑label classifiers with parameter‑efficient fine‑tuning (PEFT).
Note: This card describes a training pipeline + example checkpoints. If you push a specific checkpoint to the Hub, please fill in exact dataset splits, metrics, and license at upload time.
Model Details
Model Description
This project provides a modular training & inference stack for multi‑label text classification built on top of Hugging Face Transformers and PEFT. It adapts decoder‑only LLMs (tested with meta-llama/Llama-3.2-1B
) using LoRA adapters, and optionally enables 4‑bit quantization (QLoRA‑style) for reduced memory footprint during training and inference. The repository exposes a single CLI for train/eval/predict and a YAML configuration to control data paths, model choice, and hyperparameters.
- Developed by: Amirhossein Yousefi (GitHub:
amirhossein-yousefi
; Hugging Face:Amirhossein75
) - Model type: Decoder‑only causal LM with PEFT (LoRA) for multi‑label classification
- Language(s): English (evaluated on AmazonCat‑13K subset)
- License: The base model (
meta-llama/Llama-3.2-1B
) is under the Llama 3.2 Community License. The LoRA adapter you publish should declare its own license and acknowledge base‑model terms. - Finetuned from:
meta-llama/Llama-3.2-1B
(foundation)
Model Sources
- Repository: https://github.com/amirhossein-yousefi/LLM-Decoder-Tuning-Text-Classification
- Model (Hub placeholder): https://huggingface.co/Amirhossein75/LLM-Decoder-Tuning-Text-Classification
- Background reading:
- LoRA: Low‑Rank Adaptation of Large Language Models (Hu et al., 2021)
- QLoRA: Efficient Finetuning of Quantized LLMs (Dettmers et al., 2023)
- PEFT documentation (Hugging Face)
Uses
Direct Use
- Multi‑label text classification on English corpora (e.g., product tagging, topic tagging, content routing).
- Inference via:
- Provided CLI (
python -m llm_cls.cli predict --config ...
) producing JSONL predictions. - Hugging Face pipelines with base model + LoRA adapter loaded (see “How to Get Started”).
- Provided CLI (
Downstream Use
- Domain transfer: Re‑train on your domain labels by pointing the YAML to your CSVs.
- Backbone swap: Replace
model.model_name
in the config to try other decoders or encoders (setuse_4bit=false
for encoders).
Out‑of‑Scope Use
- Safety‑critical decisions without human oversight.
- Tasks requiring extreme multilabel scaling (e.g., hundreds of thousands of labels) without additional adaptation.
- Non‑English or code‑mixed data without validation.
- Any use that conflicts with the base model’s license and acceptable‑use policies.
Bias, Risks, and Limitations
- Dataset bias: AmazonCat‑13K originates from product data; labels and text reflect marketplace distributions and may encode demographic or topical biases.
- Multi‑label long tail: Minority classes are harder; macro‑F1 often trails micro‑F1. Consider class weighting, augmentation, or threshold tuning.
- Decoder framing: Treating classification as generation can be sensitive to prompt/format and decoding thresholds.
- License & usage constraints: Ensure compliance with the Llama 3.2 Community License for derivatives and deployment.
Recommendations
- Track micro‑ and macro‑F1 and per‑class metrics.
- Use threshold tuning on validation to balance precision/recall per class.
- For memory‑constrained environments, prefer 4‑bit + LoRA; otherwise disable 4‑bit on platforms without
bitsandbytes
support.
How to Get Started with the Model
Below is an example of loading a base Llama model with a LoRA adapter for classification‑style inference. Replace BASE_MODEL
and ADAPTER_REPO
with your IDs.
from transformers import AutoTokenizer, AutoModelForCausalLM, TextGenerationPipeline
from peft import PeftModel
import torch
BASE_MODEL = "meta-llama/Llama-3.2-1B"
ADAPTER_REPO = "Amirhossein75/LLM-Decoder-Tuning-Text-Classification" # or your own adapter
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, use_fast=True)
base = AutoModelForCausalLM.from_pretrained(
BASE_MODEL,
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(base, ADAPTER_REPO)
model.eval()
# Simple prompt format for multi-label classification (adjust to your training format).
labels = ["books","movies_tv","music","pop","literature_fiction","movies","education_reference","rock","used_rental_textbooks","new"]
text = "A thrilling space opera with deep character arcs and rich world-building."
prompt = (
"You are a classifier. Given the text, return a JSON list of applicable labels from this set: "
+ ", ".join(labels) + ".\n"
+ f"Text: {text}\nLabels: "
)
pipe = TextGenerationPipeline(model=model, tokenizer=tokenizer, device=0 if torch.cuda.is_available() else -1)
out = pipe(prompt, max_new_tokens=64, do_sample=False)
print(out[0]["generated_text"])
For CLI usage:
# Train
python -m llm_cls.cli train --config configs/default.yaml
# Predict
python -m llm_cls.cli predict --config configs/default.yaml --input_csv data/test.csv --output_jsonl preds.jsonl
Training Details
Training Data
- Dataset: AmazonCat‑13K (example subset; top‑10 categories for illustration). If you use the full dataset, update CSV paths and label columns accordingly.
- Format: CSV with at least a text column and one or more label columns (multi‑label). Configure names in
configs/default.yaml
. - Splits: Train / Validation / (Optional) Test; sample scripts are provided to create CSV splits.
Training Procedure
Preprocessing
- Tokenization with the base model’s tokenizer.
- Optional script to prepare AmazonCat‑13K CSVs (see
split_amazon_13k_data.py
in the repo).
Training Hyperparameters (illustrative config)
- Base model:
meta-llama/Llama-3.2-1B
- Problem type:
multi_label_classification
- Precision / quantization:
use_4bit: true
(QLoRA‑style);torch_dtype: bfloat16
for computation - LoRA:
r=2
,alpha=2
,dropout=0.05
- LoRA target modules:
["q_proj","k_proj","v_proj","o_proj","gate_proj","down_proj","up_proj"]
- Batch size:
4
(withgradient_accumulation_steps=8
) - Max length:
1024
- Optimizer: 8‑bit optimizer when quantized (
optim_8bit_when_4bit: true
) - Epochs: up to
20
with early stopping (patience=2
) - Metric for best model:
f1_micro
Speeds, Sizes, Times (example run)
- Device: NVIDIA GeForce RTX 3080 Ti Laptop GPU (16 GB VRAM)
- Runtime: ~1,310 seconds for the best run
- Throughput: ≈0.784 steps/s (≈24.9 samples/s) during training
- Artifacts: Reproducible outputs under
outputs/<model_name>/<dataset_name>/run_<i>/
Evaluation
Testing Data, Factors & Metrics
- Testing data: Held‑out split from AmazonCat‑13K (example subset).
- Factors: Evaluate both micro‑F1 (overall) and macro‑F1 (per‑class average) to reflect long‑tail performance.
- Metrics:
f1_micro
,f1_macro
, eval loss, throughput (steps/s, samples/s).
Metrics
- Best overall (micro-F1): 0.830 at 5 epochs
- Best minority‑class sensitivity (macro-F1): 0.752 at 6 epochs
- Average across 4 runs: micro‑F1 0.824, macro‑F1 0.741, eval loss 0.161
- Throughput: train ≈ 0.784 steps/s (24.9 samples/s) ; eval time ≈ 34.0s per run.
Interpretation: going from 4 → 5 epochs gives the best micro‑F1; 6 epochs squeezes out the top macro‑F1, hinting at slightly better coverage of minority classes with a tiny trade‑off in micro‑F1.
📈 Per‑run metrics
Run | Epochs | Train Loss | Eval Loss | F1 (micro) | F1 (macro) | Train Time (s) | Train steps/s | Train samples/s | Eval Time (s) |
---|---|---|---|---|---|---|---|---|---|
1 | 4 | 1.400 | 0.157 | 0.824 | 0.738 | 1309.6 | 0.962 | 30.543 | 33.6 |
2 | 5 | 1.220 | 0.159 | 0.830 | 0.743 | 1640.3 | 0.768 | 24.385 | 34.0 |
3 | 6 | 1.063 | 0.162 | 0.826 | 0.752 | 1984.2 | 0.635 | 20.159 | 34.4 |
4 | 5 | 1.265 | 0.165 | 0.816 | 0.729 | 1639.3 | 0.769 | 24.401 | 34.0 |
F1(micro) aggregates decisions over all samples; F1(macro) averages per‑class F1 equally, highlighting minority‑class performance.
Results (example)
- Best micro‑F1:
0.830
at 5 epochs - Best macro‑F1:
0.752
at 6 epochs - Average across 4 runs: micro‑F1
0.824
, macro‑F10.741
, eval loss0.161
Summary
Decoder‑only LLMs with LoRA adapters provide competitive multi‑label performance with small memory/compute budgets. Slightly longer training (5–6 epochs) can improve macro‑F1, capturing more minority labels with minimal micro‑F1 trade‑off.
Model Examination
- Inspect confidence/threshold curves per label to tune decision thresholds.
- Use error analysis on false negatives for long‑tail labels; consider reweighting or augmentation.
Environmental Impact
- Hardware Type: Single laptop GPU (RTX 3080 Ti Laptop, 16 GB)
- Hours used (example run): ~0.36 hours
Technical Specifications
Model Architecture and Objective
- Architecture: Decoder‑only Transformer (Llama 3.2 class), adapted via LoRA.
- Objective: Multi‑label classification formulated as conditional generation with sigmoid/thresholding for label decisions.
Compute Infrastructure
Hardware
- Laptop with NVIDIA GeForce RTX 3080 Ti (laptop) GPU, 16 GB VRAM.
Software
- Python, PyTorch, Hugging Face Transformers, PEFT, (optional) bitsandbytes for 4‑bit.
Citation
If you use this work, please consider citing the following:
BibTeX:
@article{Hu2021LoRA,
title={LoRA: Low-Rank Adaptation of Large Language Models},
author={Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen},
journal={arXiv preprint arXiv:2106.09685},
year={2021}
}
@article{Dettmers2023QLoRA,
title={QLoRA: Efficient Finetuning of Quantized LLMs},
author={Tim Dettmers and Artidoro Pagnoni and Ari Holtzman and Luke Zettlemoyer},
journal={arXiv preprint arXiv:2305.14314},
year={2023}
}
APA:
- Hu, E. J., Shen, Y., Wallis, P., Allen‑Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021). LoRA: Low‑Rank Adaptation of Large Language Models. arXiv:2106.09685.
- Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). QLoRA: Efficient Finetuning of Quantized LLMs. arXiv:2305.14314.
Glossary
- LoRA: Low‑Rank Adaptation; injects small trainable matrices into a frozen backbone to adapt it efficiently.
- QLoRA (4‑bit): Finetuning with the backbone quantized to 4‑bit precision, training only LoRA adapters.
- Micro‑/Macro‑F1: Micro aggregates over all instances; Macro averages over classes equally (sensitive to minority classes).
More Information
- The repo ships a minimal CLI (
llm_cls/cli.py
) and example YAML config (configs/default.yaml
) to reproduce results. - For non‑Linux environments or if
bitsandbytes
is unavailable, disable 4‑bit and train in standard precision.
Model Card Authors
- Author/Maintainer: Amirhossein Yousefi (
amirhossein-yousefi
/Amirhossein75
)
Model Card Contact
- Open an issue in the GitHub repository or contact the Hugging Face user
Amirhossein75
.
Model tree for Amirhossein75/LLM-Decoder-Tuning-Text-Classification
Base model
meta-llama/Llama-3.2-1B