File size: 12,802 Bytes
7366eb9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 |
---
# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
# Doc / guide: https://huggingface.co/docs/hub/model-cards
datasets:
- "pietrolesci/amazoncat-13k"
language:
- en
library_name: transformers
license: other # Base model (Meta Llama 3.2) is under the Llama 3.2 Community License
pipeline_tag: text-classification
tags:
- multi-label
- LoRA
- QLoRA
- bitsandbytes
- decoder-only
- llama-3.2-1b
- peft
- text-classification
base_model: meta-llama/Llama-3.2-1B
---
# Model Card for Amirhossein75/LLM-Decoder-Tuning-Text-Classification
> **One‑line summary:** Decoder‑only LLMs (e.g., Llama‑3.2‑1B) fine‑tuned for **multi‑label text classification** using **LoRA** adapters, with optional **4‑bit QLoRA** quantization for memory‑efficient training and inference. A clean CLI and YAML config make it easy to reproduce results and swap backbones.
This model card accompanies the repository **LLM‑Decoder‑Tuning‑Text‑Classification** and documents a practical recipe for using decoder‑only LLMs as strong multi‑label classifiers with parameter‑efficient fine‑tuning (PEFT).
> **Note:** This card describes a *training pipeline + example checkpoints*. If you push a specific checkpoint to the Hub, please fill in exact dataset splits, metrics, and license at upload time.
---
## Model Details
### Model Description
This project provides a **modular training & inference stack** for multi‑label text classification built on top of **Hugging Face Transformers** and **PEFT**. It adapts **decoder‑only** LLMs (tested with `meta-llama/Llama-3.2-1B`) using **LoRA** adapters, and optionally enables **4‑bit quantization** (QLoRA‑style) for reduced memory footprint during training and inference. The repository exposes a **single CLI** for train/eval/predict and a **YAML configuration** to control data paths, model choice, and hyperparameters.
- **Developed by:** Amirhossein Yousefi (GitHub: `amirhossein-yousefi`; Hugging Face: `Amirhossein75`)
- **Model type:** Decoder‑only causal LM with PEFT (LoRA) for multi‑label classification
- **Language(s):** English (evaluated on AmazonCat‑13K subset)
- **License:** The **base model** (`meta-llama/Llama-3.2-1B`) is under the **Llama 3.2 Community License**. The LoRA adapter you publish should declare its own license and acknowledge base‑model terms.
- **Finetuned from:** `meta-llama/Llama-3.2-1B` (foundation)
### Model Sources
- **Repository:** https://github.com/amirhossein-yousefi/LLM-Decoder-Tuning-Text-Classification
- **Model (Hub placeholder):** https://huggingface.co/Amirhossein75/LLM-Decoder-Tuning-Text-Classification
- **Background reading:**
- LoRA: Low‑Rank Adaptation of Large Language Models (Hu et al., 2021)
- QLoRA: Efficient Finetuning of Quantized LLMs (Dettmers et al., 2023)
- PEFT documentation (Hugging Face)
---
## Uses
### Direct Use
- **Multi‑label text classification** on English corpora (e.g., product tagging, topic tagging, content routing).
- Inference via:
- Provided **CLI** (`python -m llm_cls.cli predict --config ...`) producing JSONL predictions.
- Hugging Face pipelines with base model + LoRA adapter loaded (see “How to Get Started”).
### Downstream Use
- **Domain transfer:** Re‑train on your domain labels by pointing the YAML to your CSVs.
- **Backbone swap:** Replace `model.model_name` in the config to try other decoders or encoders (set `use_4bit=false` for encoders).
### Out‑of‑Scope Use
- Safety‑critical decisions without human oversight.
- Tasks requiring **extreme multilabel** scaling (e.g., hundreds of thousands of labels) without additional adaptation.
- Non‑English or code‑mixed data without validation.
- Any use that conflicts with the base model’s license and acceptable‑use policies.
---
## Bias, Risks, and Limitations
- **Dataset bias:** AmazonCat‑13K originates from product data; labels and text reflect marketplace distributions and may encode demographic or topical biases.
- **Multi‑label long tail:** Minority classes are harder; macro‑F1 often trails micro‑F1. Consider class weighting, augmentation, or threshold tuning.
- **Decoder framing:** Treating classification as generation can be sensitive to prompt/format and decoding thresholds.
- **License & usage constraints:** Ensure compliance with the Llama 3.2 Community License for derivatives and deployment.
### Recommendations
- Track **micro‑ and macro‑F1** and per‑class metrics.
- Use **threshold tuning** on validation to balance precision/recall per class.
- For memory‑constrained environments, prefer **4‑bit + LoRA**; otherwise disable 4‑bit on platforms without `bitsandbytes` support.
---
## How to Get Started with the Model
Below is an example of loading a base Llama model with a LoRA adapter for classification‑style inference. Replace `BASE_MODEL` and `ADAPTER_REPO` with your IDs.
```python
from transformers import AutoTokenizer, AutoModelForCausalLM, TextGenerationPipeline
from peft import PeftModel
import torch
BASE_MODEL = "meta-llama/Llama-3.2-1B"
ADAPTER_REPO = "Amirhossein75/LLM-Decoder-Tuning-Text-Classification" # or your own adapter
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, use_fast=True)
base = AutoModelForCausalLM.from_pretrained(
BASE_MODEL,
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(base, ADAPTER_REPO)
model.eval()
# Simple prompt format for multi-label classification (adjust to your training format).
labels = ["books","movies_tv","music","pop","literature_fiction","movies","education_reference","rock","used_rental_textbooks","new"]
text = "A thrilling space opera with deep character arcs and rich world-building."
prompt = (
"You are a classifier. Given the text, return a JSON list of applicable labels from this set: "
+ ", ".join(labels) + ".\n"
+ f"Text: {text}\nLabels: "
)
pipe = TextGenerationPipeline(model=model, tokenizer=tokenizer, device=0 if torch.cuda.is_available() else -1)
out = pipe(prompt, max_new_tokens=64, do_sample=False)
print(out[0]["generated_text"])
```
For **CLI usage**:
```bash
# Train
python -m llm_cls.cli train --config configs/default.yaml
# Predict
python -m llm_cls.cli predict --config configs/default.yaml --input_csv data/test.csv --output_jsonl preds.jsonl
```
---
## Training Details
### Training Data
- **Dataset:** AmazonCat‑13K (example subset; top‑10 categories for illustration). If you use the full dataset, update CSV paths and label columns accordingly.
- **Format:** CSV with at least a text column and one or more label columns (multi‑label). Configure names in `configs/default.yaml`.
- **Splits:** Train / Validation / (Optional) Test; sample scripts are provided to create CSV splits.
### Training Procedure
#### Preprocessing
- Tokenization with the base model’s tokenizer.
- Optional script to prepare AmazonCat‑13K CSVs (see `split_amazon_13k_data.py` in the repo).
#### Training Hyperparameters (illustrative config)
- **Base model:** `meta-llama/Llama-3.2-1B`
- **Problem type:** `multi_label_classification`
- **Precision / quantization:** `use_4bit: true` (QLoRA‑style); `torch_dtype: bfloat16` for computation
- **LoRA:** `r=2`, `alpha=2`, `dropout=0.05`
- **LoRA target modules:** `["q_proj","k_proj","v_proj","o_proj","gate_proj","down_proj","up_proj"]`
- **Batch size:** `4` (with `gradient_accumulation_steps=8`)
- **Max length:** `1024`
- **Optimizer:** 8‑bit optimizer when quantized (`optim_8bit_when_4bit: true`)
- **Epochs:** up to `20` with early stopping (`patience=2`)
- **Metric for best model:** `f1_micro`
#### Speeds, Sizes, Times (example run)
- **Device:** NVIDIA GeForce RTX 3080 Ti Laptop GPU (16 GB VRAM)
- **Runtime:** ~1,310 seconds for the best run
- **Throughput:** ≈0.784 steps/s (≈24.9 samples/s) during training
- **Artifacts:** Reproducible outputs under `outputs/<model_name>/<dataset_name>/run_<i>/`
---
## Evaluation
### Testing Data, Factors & Metrics
- **Testing data:** Held‑out split from AmazonCat‑13K (example subset).
- **Factors:** Evaluate both **micro‑F1** (overall) and **macro‑F1** (per‑class average) to reflect long‑tail performance.
- **Metrics:** `f1_micro`, `f1_macro`, eval loss, throughput (steps/s, samples/s).
### Metrics
- **Best overall (micro-F1):** **0.830** at **5 epochs**
- **Best minority‑class sensitivity (macro-F1):** **0.752** at **6 epochs**
- **Average across 4 runs:** micro‑F1 **0.824**, macro‑F1 **0.741**, eval loss **0.161**
- **Throughput:** train ≈ **0.784 steps/s** (**24.9 samples/s**) ; eval time ≈ **34.0s** per run.
> Interpretation: going from **4 → 5 epochs** gives the best **micro‑F1**; **6 epochs** squeezes out the top **macro‑F1**, hinting at slightly better coverage of minority classes with a tiny trade‑off in micro‑F1.
---
### 📈 Per‑run metrics
| Run | Epochs | Train Loss | Eval Loss | F1 (micro) | F1 (macro) | Train Time (s) | Train steps/s | Train samples/s | Eval Time (s) |
|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| 1 | 4 | 1.400 | 0.157 | 0.824 | 0.738 | 1309.6 | 0.962 | 30.543 | 33.6 |
| 2 | 5 | 1.220 | 0.159 | 0.830 | 0.743 | 1640.3 | 0.768 | 24.385 | 34.0 |
| 3 | 6 | 1.063 | 0.162 | 0.826 | 0.752 | 1984.2 | 0.635 | 20.159 | 34.4 |
| 4 | 5 | 1.265 | 0.165 | 0.816 | 0.729 | 1639.3 | 0.769 | 24.401 | 34.0 |
<sub>*F1(micro)* aggregates decisions over all samples; *F1(macro)* averages per‑class F1 equally, highlighting minority‑class performance.</sub>
### Results (example)
- **Best micro‑F1:** `0.830` at 5 epochs
- **Best macro‑F1:** `0.752` at 6 epochs
- **Average across 4 runs:** micro‑F1 `0.824`, macro‑F1 `0.741`, eval loss `0.161`
#### Summary
Decoder‑only LLMs with **LoRA** adapters provide competitive multi‑label performance with small memory/compute budgets. Slightly longer training (5–6 epochs) can improve macro‑F1, capturing more minority labels with minimal micro‑F1 trade‑off.
---
## Model Examination
- Inspect confidence/threshold curves per label to tune decision thresholds.
- Use error analysis on false negatives for long‑tail labels; consider reweighting or augmentation.
---
## Environmental Impact
- **Hardware Type:** Single laptop GPU (RTX 3080 Ti Laptop, 16 GB)
- **Hours used (example run):** ~0.36 hours
---
## Technical Specifications
### Model Architecture and Objective
- **Architecture:** Decoder‑only Transformer (Llama 3.2 class), adapted via **LoRA**.
- **Objective:** Multi‑label classification formulated as conditional generation with sigmoid/thresholding for label decisions.
### Compute Infrastructure
#### Hardware
- Laptop with NVIDIA GeForce RTX 3080 Ti (laptop) GPU, 16 GB VRAM.
#### Software
- Python, PyTorch, Hugging Face Transformers, PEFT, (optional) bitsandbytes for 4‑bit.
---
## Citation
If you use this work, please consider citing the following:
**BibTeX:**
```bibtex
@article{Hu2021LoRA,
title={LoRA: Low-Rank Adaptation of Large Language Models},
author={Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen},
journal={arXiv preprint arXiv:2106.09685},
year={2021}
}
@article{Dettmers2023QLoRA,
title={QLoRA: Efficient Finetuning of Quantized LLMs},
author={Tim Dettmers and Artidoro Pagnoni and Ari Holtzman and Luke Zettlemoyer},
journal={arXiv preprint arXiv:2305.14314},
year={2023}
}
```
**APA:**
- Hu, E. J., Shen, Y., Wallis, P., Allen‑Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021). *LoRA: Low‑Rank Adaptation of Large Language Models*. arXiv:2106.09685.
- Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). *QLoRA: Efficient Finetuning of Quantized LLMs*. arXiv:2305.14314.
---
## Glossary
- **LoRA:** Low‑Rank Adaptation; injects small trainable matrices into a frozen backbone to adapt it efficiently.
- **QLoRA (4‑bit):** Finetuning with the backbone quantized to 4‑bit precision, training only LoRA adapters.
- **Micro‑/Macro‑F1:** Micro aggregates over all instances; Macro averages over classes equally (sensitive to minority classes).
---
## More Information
- The repo ships a minimal CLI (`llm_cls/cli.py`) and example YAML config (`configs/default.yaml`) to reproduce results.
- For non‑Linux environments or if `bitsandbytes` is unavailable, disable 4‑bit and train in standard precision.
---
## Model Card Authors
- **Author/Maintainer:** Amirhossein Yousefi (`amirhossein-yousefi` / `Amirhossein75`)
## Model Card Contact
- Open an issue in the GitHub repository or contact the Hugging Face user `Amirhossein75`.
|