Model Card for Amirhossein75/LLM-Decoder-Tuning-Text-Classification

One‑line summary: Decoder‑only LLMs (e.g., Llama‑3.2‑1B) fine‑tuned for multi‑label text classification using LoRA adapters, with optional 4‑bit QLoRA quantization for memory‑efficient training and inference. A clean CLI and YAML config make it easy to reproduce results and swap backbones.

This model card accompanies the repository LLM‑Decoder‑Tuning‑Text‑Classification and documents a practical recipe for using decoder‑only LLMs as strong multi‑label classifiers with parameter‑efficient fine‑tuning (PEFT).

Note: This card describes a training pipeline + example checkpoints. If you push a specific checkpoint to the Hub, please fill in exact dataset splits, metrics, and license at upload time.


Model Details

Model Description

This project provides a modular training & inference stack for multi‑label text classification built on top of Hugging Face Transformers and PEFT. It adapts decoder‑only LLMs (tested with meta-llama/Llama-3.2-1B) using LoRA adapters, and optionally enables 4‑bit quantization (QLoRA‑style) for reduced memory footprint during training and inference. The repository exposes a single CLI for train/eval/predict and a YAML configuration to control data paths, model choice, and hyperparameters.

  • Developed by: Amirhossein Yousefi (GitHub: amirhossein-yousefi; Hugging Face: Amirhossein75)
  • Model type: Decoder‑only causal LM with PEFT (LoRA) for multi‑label classification
  • Language(s): English (evaluated on AmazonCat‑13K subset)
  • License: The base model (meta-llama/Llama-3.2-1B) is under the Llama 3.2 Community License. The LoRA adapter you publish should declare its own license and acknowledge base‑model terms.
  • Finetuned from: meta-llama/Llama-3.2-1B (foundation)

Model Sources


Uses

Direct Use

  • Multi‑label text classification on English corpora (e.g., product tagging, topic tagging, content routing).
  • Inference via:
    • Provided CLI (python -m llm_cls.cli predict --config ...) producing JSONL predictions.
    • Hugging Face pipelines with base model + LoRA adapter loaded (see “How to Get Started”).

Downstream Use

  • Domain transfer: Re‑train on your domain labels by pointing the YAML to your CSVs.
  • Backbone swap: Replace model.model_name in the config to try other decoders or encoders (set use_4bit=false for encoders).

Out‑of‑Scope Use

  • Safety‑critical decisions without human oversight.
  • Tasks requiring extreme multilabel scaling (e.g., hundreds of thousands of labels) without additional adaptation.
  • Non‑English or code‑mixed data without validation.
  • Any use that conflicts with the base model’s license and acceptable‑use policies.

Bias, Risks, and Limitations

  • Dataset bias: AmazonCat‑13K originates from product data; labels and text reflect marketplace distributions and may encode demographic or topical biases.
  • Multi‑label long tail: Minority classes are harder; macro‑F1 often trails micro‑F1. Consider class weighting, augmentation, or threshold tuning.
  • Decoder framing: Treating classification as generation can be sensitive to prompt/format and decoding thresholds.
  • License & usage constraints: Ensure compliance with the Llama 3.2 Community License for derivatives and deployment.

Recommendations

  • Track micro‑ and macro‑F1 and per‑class metrics.
  • Use threshold tuning on validation to balance precision/recall per class.
  • For memory‑constrained environments, prefer 4‑bit + LoRA; otherwise disable 4‑bit on platforms without bitsandbytes support.

How to Get Started with the Model

Below is an example of loading a base Llama model with a LoRA adapter for classification‑style inference. Replace BASE_MODEL and ADAPTER_REPO with your IDs.

from transformers import AutoTokenizer, AutoModelForCausalLM, TextGenerationPipeline
from peft import PeftModel
import torch

BASE_MODEL = "meta-llama/Llama-3.2-1B"
ADAPTER_REPO = "Amirhossein75/LLM-Decoder-Tuning-Text-Classification"  # or your own adapter

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, use_fast=True)
base = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(base, ADAPTER_REPO)
model.eval()

# Simple prompt format for multi-label classification (adjust to your training format).
labels = ["books","movies_tv","music","pop","literature_fiction","movies","education_reference","rock","used_rental_textbooks","new"]
text = "A thrilling space opera with deep character arcs and rich world-building."

prompt = (
    "You are a classifier. Given the text, return a JSON list of applicable labels from this set: "
    + ", ".join(labels) + ".\n"
    + f"Text: {text}\nLabels: "
)

pipe = TextGenerationPipeline(model=model, tokenizer=tokenizer, device=0 if torch.cuda.is_available() else -1)
out = pipe(prompt, max_new_tokens=64, do_sample=False)
print(out[0]["generated_text"])

For CLI usage:

# Train
python -m llm_cls.cli train --config configs/default.yaml

# Predict
python -m llm_cls.cli predict   --config configs/default.yaml   --input_csv data/test.csv   --output_jsonl preds.jsonl

Training Details

Training Data

  • Dataset: AmazonCat‑13K (example subset; top‑10 categories for illustration). If you use the full dataset, update CSV paths and label columns accordingly.
  • Format: CSV with at least a text column and one or more label columns (multi‑label). Configure names in configs/default.yaml.
  • Splits: Train / Validation / (Optional) Test; sample scripts are provided to create CSV splits.

Training Procedure

Preprocessing

  • Tokenization with the base model’s tokenizer.
  • Optional script to prepare AmazonCat‑13K CSVs (see split_amazon_13k_data.py in the repo).

Training Hyperparameters (illustrative config)

  • Base model: meta-llama/Llama-3.2-1B
  • Problem type: multi_label_classification
  • Precision / quantization: use_4bit: true (QLoRA‑style); torch_dtype: bfloat16 for computation
  • LoRA: r=2, alpha=2, dropout=0.05
  • LoRA target modules: ["q_proj","k_proj","v_proj","o_proj","gate_proj","down_proj","up_proj"]
  • Batch size: 4 (with gradient_accumulation_steps=8)
  • Max length: 1024
  • Optimizer: 8‑bit optimizer when quantized (optim_8bit_when_4bit: true)
  • Epochs: up to 20 with early stopping (patience=2)
  • Metric for best model: f1_micro

Speeds, Sizes, Times (example run)

  • Device: NVIDIA GeForce RTX 3080 Ti Laptop GPU (16 GB VRAM)
  • Runtime: ~1,310 seconds for the best run
  • Throughput: ≈0.784 steps/s (≈24.9 samples/s) during training
  • Artifacts: Reproducible outputs under outputs/<model_name>/<dataset_name>/run_<i>/

Evaluation

Testing Data, Factors & Metrics

  • Testing data: Held‑out split from AmazonCat‑13K (example subset).
  • Factors: Evaluate both micro‑F1 (overall) and macro‑F1 (per‑class average) to reflect long‑tail performance.
  • Metrics: f1_micro, f1_macro, eval loss, throughput (steps/s, samples/s).

Metrics

  • Best overall (micro-F1): 0.830 at 5 epochs
  • Best minority‑class sensitivity (macro-F1): 0.752 at 6 epochs
  • Average across 4 runs: micro‑F1 0.824, macro‑F1 0.741, eval loss 0.161
  • Throughput: train ≈ 0.784 steps/s (24.9 samples/s) ; eval time ≈ 34.0s per run.

Interpretation: going from 4 → 5 epochs gives the best micro‑F1; 6 epochs squeezes out the top macro‑F1, hinting at slightly better coverage of minority classes with a tiny trade‑off in micro‑F1.


📈 Per‑run metrics

Run Epochs Train Loss Eval Loss F1 (micro) F1 (macro) Train Time (s) Train steps/s Train samples/s Eval Time (s)
1 4 1.400 0.157 0.824 0.738 1309.6 0.962 30.543 33.6
2 5 1.220 0.159 0.830 0.743 1640.3 0.768 24.385 34.0
3 6 1.063 0.162 0.826 0.752 1984.2 0.635 20.159 34.4
4 5 1.265 0.165 0.816 0.729 1639.3 0.769 24.401 34.0

F1(micro) aggregates decisions over all samples; F1(macro) averages per‑class F1 equally, highlighting minority‑class performance.

Results (example)

  • Best micro‑F1: 0.830 at 5 epochs
  • Best macro‑F1: 0.752 at 6 epochs
  • Average across 4 runs: micro‑F1 0.824, macro‑F1 0.741, eval loss 0.161

Summary

Decoder‑only LLMs with LoRA adapters provide competitive multi‑label performance with small memory/compute budgets. Slightly longer training (5–6 epochs) can improve macro‑F1, capturing more minority labels with minimal micro‑F1 trade‑off.


Model Examination

  • Inspect confidence/threshold curves per label to tune decision thresholds.
  • Use error analysis on false negatives for long‑tail labels; consider reweighting or augmentation.

Environmental Impact

  • Hardware Type: Single laptop GPU (RTX 3080 Ti Laptop, 16 GB)
  • Hours used (example run): ~0.36 hours

Technical Specifications

Model Architecture and Objective

  • Architecture: Decoder‑only Transformer (Llama 3.2 class), adapted via LoRA.
  • Objective: Multi‑label classification formulated as conditional generation with sigmoid/thresholding for label decisions.

Compute Infrastructure

Hardware

  • Laptop with NVIDIA GeForce RTX 3080 Ti (laptop) GPU, 16 GB VRAM.

Software

  • Python, PyTorch, Hugging Face Transformers, PEFT, (optional) bitsandbytes for 4‑bit.

Citation

If you use this work, please consider citing the following:

BibTeX:

@article{Hu2021LoRA,
  title={LoRA: Low-Rank Adaptation of Large Language Models},
  author={Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen},
  journal={arXiv preprint arXiv:2106.09685},
  year={2021}
}

@article{Dettmers2023QLoRA,
  title={QLoRA: Efficient Finetuning of Quantized LLMs},
  author={Tim Dettmers and Artidoro Pagnoni and Ari Holtzman and Luke Zettlemoyer},
  journal={arXiv preprint arXiv:2305.14314},
  year={2023}
}

APA:

  • Hu, E. J., Shen, Y., Wallis, P., Allen‑Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021). LoRA: Low‑Rank Adaptation of Large Language Models. arXiv:2106.09685.
  • Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). QLoRA: Efficient Finetuning of Quantized LLMs. arXiv:2305.14314.

Glossary

  • LoRA: Low‑Rank Adaptation; injects small trainable matrices into a frozen backbone to adapt it efficiently.
  • QLoRA (4‑bit): Finetuning with the backbone quantized to 4‑bit precision, training only LoRA adapters.
  • Micro‑/Macro‑F1: Micro aggregates over all instances; Macro averages over classes equally (sensitive to minority classes).

More Information

  • The repo ships a minimal CLI (llm_cls/cli.py) and example YAML config (configs/default.yaml) to reproduce results.
  • For non‑Linux environments or if bitsandbytes is unavailable, disable 4‑bit and train in standard precision.

Model Card Authors

  • Author/Maintainer: Amirhossein Yousefi (amirhossein-yousefi / Amirhossein75)

Model Card Contact

  • Open an issue in the GitHub repository or contact the Hugging Face user Amirhossein75.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Amirhossein75/LLM-Decoder-Tuning-Text-Classification

Finetuned
(688)
this model

Dataset used to train Amirhossein75/LLM-Decoder-Tuning-Text-Classification