Model Card for Amirhossein75/LLM-Decoder-Tuning-Text-Classification

One‑line summary: Decoder‑only LLMs (e.g., Llama‑3.2‑1B) fine‑tuned for multi‑label text classification using LoRA adapters, with optional 4‑bit QLoRA quantization for memory‑efficient training and inference. A clean CLI and YAML config make it easy to reproduce results and swap backbones.

This model card accompanies the repository LLM‑Decoder‑Tuning‑Text‑Classification and documents a practical recipe for using decoder‑only LLMs as strong multi‑label classifiers with parameter‑efficient fine‑tuning (PEFT).

Note: This card describes a training pipeline + example checkpoints. If you push a specific checkpoint to the Hub, please fill in exact dataset splits, metrics, and license at upload time.

Model Details

Model Description

This project provides a modular training & inference stack for multi‑label text classification built on top of Hugging Face Transformers and PEFT. It adapts decoder‑only LLMs (tested with meta-llama/Llama-3.2-1B) using LoRA adapters, and optionally enables 4‑bit quantization (QLoRA‑style) for reduced memory footprint during training and inference. The repository exposes a single CLI for train/eval/predict and a YAML configuration to control data paths, model choice, and hyperparameters.

Developed by: Amirhossein Yousefi (GitHub: amirhossein-yousefi; Hugging Face: Amirhossein75)
Model type: Decoder‑only causal LM with PEFT (LoRA) for multi‑label classification
Language(s): English (evaluated on AmazonCat‑13K subset)
License: The base model (meta-llama/Llama-3.2-1B) is under the Llama 3.2 Community License. The LoRA adapter you publish should declare its own license and acknowledge base‑model terms.
Finetuned from: meta-llama/Llama-3.2-1B (foundation)

Model Sources

Repository: https://github.com/amirhossein-yousefi/LLM-Decoder-Tuning-Text-Classification
Model (Hub placeholder): https://huggingface.co/Amirhossein75/LLM-Decoder-Tuning-Text-Classification
Background reading:
- LoRA: Low‑Rank Adaptation of Large Language Models (Hu et al., 2021)
- QLoRA: Efficient Finetuning of Quantized LLMs (Dettmers et al., 2023)
- PEFT documentation (Hugging Face)

Uses

Direct Use

Multi‑label text classification on English corpora (e.g., product tagging, topic tagging, content routing).
Inference via:
- Provided CLI (python -m llm_cls.cli predict --config ...) producing JSONL predictions.
- Hugging Face pipelines with base model + LoRA adapter loaded (see “How to Get Started”).

Downstream Use

Domain transfer: Re‑train on your domain labels by pointing the YAML to your CSVs.
Backbone swap: Replace model.model_name in the config to try other decoders or encoders (set use_4bit=false for encoders).

Out‑of‑Scope Use

Safety‑critical decisions without human oversight.
Tasks requiring extreme multilabel scaling (e.g., hundreds of thousands of labels) without additional adaptation.
Non‑English or code‑mixed data without validation.
Any use that conflicts with the base model’s license and acceptable‑use policies.

Bias, Risks, and Limitations

Dataset bias: AmazonCat‑13K originates from product data; labels and text reflect marketplace distributions and may encode demographic or topical biases.
Multi‑label long tail: Minority classes are harder; macro‑F1 often trails micro‑F1. Consider class weighting, augmentation, or threshold tuning.
Decoder framing: Treating classification as generation can be sensitive to prompt/format and decoding thresholds.
License & usage constraints: Ensure compliance with the Llama 3.2 Community License for derivatives and deployment.

Recommendations

Track micro‑ and macro‑F1 and per‑class metrics.
Use threshold tuning on validation to balance precision/recall per class.
For memory‑constrained environments, prefer 4‑bit + LoRA; otherwise disable 4‑bit on platforms without bitsandbytes support.

How to Get Started with the Model

Below is an example of loading a base Llama model with a LoRA adapter for classification‑style inference. Replace BASE_MODEL and ADAPTER_REPO with your IDs.

from transformers import AutoTokenizer, AutoModelForCausalLM, TextGenerationPipeline
from peft import PeftModel
import torch

BASE_MODEL = "meta-llama/Llama-3.2-1B"
ADAPTER_REPO = "Amirhossein75/LLM-Decoder-Tuning-Text-Classification"  # or your own adapter

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, use_fast=True)
base = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(base, ADAPTER_REPO)
model.eval()

# Simple prompt format for multi-label classification (adjust to your training format).
labels = ["books","movies_tv","music","pop","literature_fiction","movies","education_reference","rock","used_rental_textbooks","new"]
text = "A thrilling space opera with deep character arcs and rich world-building."

prompt = (
    "You are a classifier. Given the text, return a JSON list of applicable labels from this set: "
    + ", ".join(labels) + ".\n"
    + f"Text: {text}\nLabels: "
)

pipe = TextGenerationPipeline(model=model, tokenizer=tokenizer, device=0 if torch.cuda.is_available() else -1)
out = pipe(prompt, max_new_tokens=64, do_sample=False)
print(out[0]["generated_text"])

For CLI usage:

# Train
python -m llm_cls.cli train --config configs/default.yaml

# Predict
python -m llm_cls.cli predict   --config configs/default.yaml   --input_csv data/test.csv   --output_jsonl preds.jsonl

Training Details

Training Data

Dataset: AmazonCat‑13K (example subset; top‑10 categories for illustration). If you use the full dataset, update CSV paths and label columns accordingly.
Format: CSV with at least a text column and one or more label columns (multi‑label). Configure names in configs/default.yaml.
Splits: Train / Validation / (Optional) Test; sample scripts are provided to create CSV splits.

Training Procedure

Preprocessing

Tokenization with the base model’s tokenizer.
Optional script to prepare AmazonCat‑13K CSVs (see split_amazon_13k_data.py in the repo).

Training Hyperparameters (illustrative config)

Base model: meta-llama/Llama-3.2-1B
Problem type: multi_label_classification
Precision / quantization: use_4bit: true (QLoRA‑style); torch_dtype: bfloat16 for computation
LoRA: r=2, alpha=2, dropout=0.05
LoRA target modules: ["q_proj","k_proj","v_proj","o_proj","gate_proj","down_proj","up_proj"]
Batch size: 4 (with gradient_accumulation_steps=8)
Max length: 1024
Optimizer: 8‑bit optimizer when quantized (optim_8bit_when_4bit: true)
Epochs: up to 20 with early stopping (patience=2)
Metric for best model: f1_micro

Speeds, Sizes, Times (example run)

Device: NVIDIA GeForce RTX 3080 Ti Laptop GPU (16 GB VRAM)
Runtime: ~1,310 seconds for the best run
Throughput: ≈0.784 steps/s (≈24.9 samples/s) during training
Artifacts: Reproducible outputs under outputs/<model_name>/<dataset_name>/run_<i>/

Evaluation

Testing Data, Factors & Metrics

Testing data: Held‑out split from AmazonCat‑13K (example subset).
Factors: Evaluate both micro‑F1 (overall) and macro‑F1 (per‑class average) to reflect long‑tail performance.
Metrics: f1_micro, f1_macro, eval loss, throughput (steps/s, samples/s).

Metrics

Best overall (micro-F1): 0.830 at 5 epochs
Best minority‑class sensitivity (macro-F1): 0.752 at 6 epochs
Average across 4 runs: micro‑F1 0.824, macro‑F1 0.741, eval loss 0.161
Throughput: train ≈ 0.784 steps/s (24.9 samples/s) ; eval time ≈ 34.0s per run.

Interpretation: going from 4 → 5 epochs gives the best micro‑F1; 6 epochs squeezes out the top macro‑F1, hinting at slightly better coverage of minority classes with a tiny trade‑off in micro‑F1.

📈 Per‑run metrics

Run	Epochs	Train Loss	Eval Loss	F1 (micro)	F1 (macro)	Train Time (s)	Train steps/s	Train samples/s	Eval Time (s)
1	4	1.400	0.157	0.824	0.738	1309.6	0.962	30.543	33.6
2	5	1.220	0.159	0.830	0.743	1640.3	0.768	24.385	34.0
3	6	1.063	0.162	0.826	0.752	1984.2	0.635	20.159	34.4
4	5	1.265	0.165	0.816	0.729	1639.3	0.769	24.401	34.0

_{F1(micro) aggregates decisions over all samples; F1(macro) averages per‑class F1 equally, highlighting minority‑class performance.}

Results (example)

Best micro‑F1: 0.830 at 5 epochs
Best macro‑F1: 0.752 at 6 epochs
Average across 4 runs: micro‑F1 0.824, macro‑F1 0.741, eval loss 0.161

Summary

Decoder‑only LLMs with LoRA adapters provide competitive multi‑label performance with small memory/compute budgets. Slightly longer training (5–6 epochs) can improve macro‑F1, capturing more minority labels with minimal micro‑F1 trade‑off.

Model Examination

Inspect confidence/threshold curves per label to tune decision thresholds.
Use error analysis on false negatives for long‑tail labels; consider reweighting or augmentation.

Environmental Impact

Hardware Type: Single laptop GPU (RTX 3080 Ti Laptop, 16 GB)
Hours used (example run): ~0.36 hours

Technical Specifications

Model Architecture and Objective

Architecture: Decoder‑only Transformer (Llama 3.2 class), adapted via LoRA.
Objective: Multi‑label classification formulated as conditional generation with sigmoid/thresholding for label decisions.

Compute Infrastructure

Hardware

Laptop with NVIDIA GeForce RTX 3080 Ti (laptop) GPU, 16 GB VRAM.

Software

Python, PyTorch, Hugging Face Transformers, PEFT, (optional) bitsandbytes for 4‑bit.

Citation

If you use this work, please consider citing the following:

BibTeX:

@article{Hu2021LoRA,
  title={LoRA: Low-Rank Adaptation of Large Language Models},
  author={Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen},
  journal={arXiv preprint arXiv:2106.09685},
  year={2021}
}

@article{Dettmers2023QLoRA,
  title={QLoRA: Efficient Finetuning of Quantized LLMs},
  author={Tim Dettmers and Artidoro Pagnoni and Ari Holtzman and Luke Zettlemoyer},
  journal={arXiv preprint arXiv:2305.14314},
  year={2023}
}

APA:

Hu, E. J., Shen, Y., Wallis, P., Allen‑Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021). LoRA: Low‑Rank Adaptation of Large Language Models. arXiv:2106.09685.
Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). QLoRA: Efficient Finetuning of Quantized LLMs. arXiv:2305.14314.

Glossary

LoRA: Low‑Rank Adaptation; injects small trainable matrices into a frozen backbone to adapt it efficiently.
QLoRA (4‑bit): Finetuning with the backbone quantized to 4‑bit precision, training only LoRA adapters.
Micro‑/Macro‑F1: Micro aggregates over all instances; Macro averages over classes equally (sensitive to minority classes).

More Information

The repo ships a minimal CLI (llm_cls/cli.py) and example YAML config (configs/default.yaml) to reproduce results.
For non‑Linux environments or if bitsandbytes is unavailable, disable 4‑bit and train in standard precision.

Model Card Authors

Author/Maintainer: Amirhossein Yousefi (amirhossein-yousefi / Amirhossein75)

Model Card Contact

Open an issue in the GitHub repository or contact the Hugging Face user Amirhossein75.

Amirhossein75
/

LLM-Decoder-Tuning-Text-Classification