File size: 12,802 Bytes
7366eb9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
---
# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
# Doc / guide: https://huggingface.co/docs/hub/model-cards
datasets:
  - "pietrolesci/amazoncat-13k"
language:
  - en
library_name: transformers
license: other  # Base model (Meta Llama 3.2) is under the Llama 3.2 Community License
pipeline_tag: text-classification
tags:
  - multi-label
  - LoRA
  - QLoRA
  - bitsandbytes
  - decoder-only
  - llama-3.2-1b
  - peft
  - text-classification
base_model: meta-llama/Llama-3.2-1B
---

# Model Card for Amirhossein75/LLM-Decoder-Tuning-Text-Classification

> **One‑line summary:** Decoder‑only LLMs (e.g., Llama‑3.2‑1B) fine‑tuned for **multi‑label text classification** using **LoRA** adapters, with optional **4‑bit QLoRA** quantization for memory‑efficient training and inference. A clean CLI and YAML config make it easy to reproduce results and swap backbones.

This model card accompanies the repository **LLM‑Decoder‑Tuning‑Text‑Classification** and documents a practical recipe for using decoder‑only LLMs as strong multi‑label classifiers with parameter‑efficient fine‑tuning (PEFT).

> **Note:** This card describes a *training pipeline + example checkpoints*. If you push a specific checkpoint to the Hub, please fill in exact dataset splits, metrics, and license at upload time.

---

## Model Details

### Model Description

This project provides a **modular training & inference stack** for multi‑label text classification built on top of **Hugging Face Transformers** and **PEFT**. It adapts **decoder‑only** LLMs (tested with `meta-llama/Llama-3.2-1B`) using **LoRA** adapters, and optionally enables **4‑bit quantization** (QLoRA‑style) for reduced memory footprint during training and inference. The repository exposes a **single CLI** for train/eval/predict and a **YAML configuration** to control data paths, model choice, and hyperparameters.

- **Developed by:** Amirhossein Yousefi (GitHub: `amirhossein-yousefi`; Hugging Face: `Amirhossein75`)
- **Model type:** Decoder‑only causal LM with PEFT (LoRA) for multi‑label classification
- **Language(s):** English (evaluated on AmazonCat‑13K subset)
- **License:** The **base model** (`meta-llama/Llama-3.2-1B`) is under the **Llama 3.2 Community License**. The LoRA adapter you publish should declare its own license and acknowledge base‑model terms.
- **Finetuned from:** `meta-llama/Llama-3.2-1B` (foundation)

### Model Sources

- **Repository:** https://github.com/amirhossein-yousefi/LLM-Decoder-Tuning-Text-Classification
- **Model (Hub placeholder):** https://huggingface.co/Amirhossein75/LLM-Decoder-Tuning-Text-Classification
- **Background reading:**
  - LoRA: Low‑Rank Adaptation of Large Language Models (Hu et al., 2021)
  - QLoRA: Efficient Finetuning of Quantized LLMs (Dettmers et al., 2023)
  - PEFT documentation (Hugging Face)

---

## Uses

### Direct Use

- **Multi‑label text classification** on English corpora (e.g., product tagging, topic tagging, content routing).
- Inference via:
  - Provided **CLI** (`python -m llm_cls.cli predict --config ...`) producing JSONL predictions.
  - Hugging Face pipelines with base model + LoRA adapter loaded (see “How to Get Started”).

### Downstream Use 

- **Domain transfer:** Re‑train on your domain labels by pointing the YAML to your CSVs.
- **Backbone swap:** Replace `model.model_name` in the config to try other decoders or encoders (set `use_4bit=false` for encoders).

### Out‑of‑Scope Use

- Safety‑critical decisions without human oversight.
- Tasks requiring **extreme multilabel** scaling (e.g., hundreds of thousands of labels) without additional adaptation.
- Non‑English or code‑mixed data without validation.
- Any use that conflicts with the base model’s license and acceptable‑use policies.

---

## Bias, Risks, and Limitations

- **Dataset bias:** AmazonCat‑13K originates from product data; labels and text reflect marketplace distributions and may encode demographic or topical biases.
- **Multi‑label long tail:** Minority classes are harder; macro‑F1 often trails micro‑F1. Consider class weighting, augmentation, or threshold tuning.
- **Decoder framing:** Treating classification as generation can be sensitive to prompt/format and decoding thresholds.
- **License & usage constraints:** Ensure compliance with the Llama 3.2 Community License for derivatives and deployment.

### Recommendations

- Track **micro‑ and macro‑F1** and per‑class metrics.
- Use **threshold tuning** on validation to balance precision/recall per class.
- For memory‑constrained environments, prefer **4‑bit + LoRA**; otherwise disable 4‑bit on platforms without `bitsandbytes` support.

---

## How to Get Started with the Model

Below is an example of loading a base Llama model with a LoRA adapter for classification‑style inference. Replace `BASE_MODEL` and `ADAPTER_REPO` with your IDs.

```python
from transformers import AutoTokenizer, AutoModelForCausalLM, TextGenerationPipeline
from peft import PeftModel
import torch

BASE_MODEL = "meta-llama/Llama-3.2-1B"
ADAPTER_REPO = "Amirhossein75/LLM-Decoder-Tuning-Text-Classification"  # or your own adapter

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, use_fast=True)
base = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(base, ADAPTER_REPO)
model.eval()

# Simple prompt format for multi-label classification (adjust to your training format).
labels = ["books","movies_tv","music","pop","literature_fiction","movies","education_reference","rock","used_rental_textbooks","new"]
text = "A thrilling space opera with deep character arcs and rich world-building."

prompt = (
    "You are a classifier. Given the text, return a JSON list of applicable labels from this set: "
    + ", ".join(labels) + ".\n"
    + f"Text: {text}\nLabels: "
)

pipe = TextGenerationPipeline(model=model, tokenizer=tokenizer, device=0 if torch.cuda.is_available() else -1)
out = pipe(prompt, max_new_tokens=64, do_sample=False)
print(out[0]["generated_text"])
```

For **CLI usage**:

```bash
# Train
python -m llm_cls.cli train --config configs/default.yaml

# Predict
python -m llm_cls.cli predict   --config configs/default.yaml   --input_csv data/test.csv   --output_jsonl preds.jsonl
```

---

## Training Details

### Training Data

- **Dataset:** AmazonCat‑13K (example subset; top‑10 categories for illustration). If you use the full dataset, update CSV paths and label columns accordingly.
- **Format:** CSV with at least a text column and one or more label columns (multi‑label). Configure names in `configs/default.yaml`.
- **Splits:** Train / Validation / (Optional) Test; sample scripts are provided to create CSV splits.

### Training Procedure

#### Preprocessing

- Tokenization with the base model’s tokenizer.
- Optional script to prepare AmazonCat‑13K CSVs (see `split_amazon_13k_data.py` in the repo).

#### Training Hyperparameters (illustrative config)

- **Base model:** `meta-llama/Llama-3.2-1B`
- **Problem type:** `multi_label_classification`
- **Precision / quantization:** `use_4bit: true` (QLoRA‑style); `torch_dtype: bfloat16` for computation
- **LoRA:** `r=2`, `alpha=2`, `dropout=0.05`
- **LoRA target modules:** `["q_proj","k_proj","v_proj","o_proj","gate_proj","down_proj","up_proj"]`
- **Batch size:** `4` (with `gradient_accumulation_steps=8`)
- **Max length:** `1024`
- **Optimizer:** 8‑bit optimizer when quantized (`optim_8bit_when_4bit: true`)
- **Epochs:** up to `20` with early stopping (`patience=2`)
- **Metric for best model:** `f1_micro`

#### Speeds, Sizes, Times (example run)

- **Device:** NVIDIA GeForce RTX 3080 Ti Laptop GPU (16 GB VRAM)
- **Runtime:** ~1,310 seconds for the best run
- **Throughput:** ≈0.784 steps/s (≈24.9 samples/s) during training
- **Artifacts:** Reproducible outputs under `outputs/<model_name>/<dataset_name>/run_<i>/`

---

## Evaluation

### Testing Data, Factors & Metrics

- **Testing data:** Held‑out split from AmazonCat‑13K (example subset).
- **Factors:** Evaluate both **micro‑F1** (overall) and **macro‑F1** (per‑class average) to reflect long‑tail performance.
- **Metrics:** `f1_micro`, `f1_macro`, eval loss, throughput (steps/s, samples/s).
### Metrics

- **Best overall (micro-F1):** **0.830** at **5 epochs**  
- **Best minority‑class sensitivity (macro-F1):** **0.752** at **6 epochs**  
- **Average across 4 runs:** micro‑F1 **0.824**, macro‑F1 **0.741**, eval loss **0.161**  
- **Throughput:** train ≈ **0.784 steps/s** (**24.9 samples/s**) ; eval time ≈ **34.0s** per run.

> Interpretation: going from **4 → 5 epochs** gives the best **micro‑F1**; **6 epochs** squeezes out the top **macro‑F1**, hinting at slightly better coverage of minority classes with a tiny trade‑off in micro‑F1.

---
### 📈 Per‑run metrics
| Run | Epochs | Train Loss | Eval Loss | F1 (micro) | F1 (macro) | Train Time (s) | Train steps/s | Train samples/s | Eval Time (s) |
|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| 1 | 4 | 1.400 | 0.157 | 0.824 | 0.738 | 1309.6 | 0.962 | 30.543 | 33.6 |
| 2 | 5 | 1.220 | 0.159 | 0.830 | 0.743 | 1640.3 | 0.768 | 24.385 | 34.0 |
| 3 | 6 | 1.063 | 0.162 | 0.826 | 0.752 | 1984.2 | 0.635 | 20.159 | 34.4 |
| 4 | 5 | 1.265 | 0.165 | 0.816 | 0.729 | 1639.3 | 0.769 | 24.401 | 34.0 |

<sub>*F1(micro)* aggregates decisions over all samples; *F1(macro)* averages per‑class F1 equally, highlighting minority‑class performance.</sub>

### Results (example)

- **Best micro‑F1:** `0.830` at 5 epochs
- **Best macro‑F1:** `0.752` at 6 epochs
- **Average across 4 runs:** micro‑F1 `0.824`, macro‑F1 `0.741`, eval loss `0.161`

#### Summary

Decoder‑only LLMs with **LoRA** adapters provide competitive multi‑label performance with small memory/compute budgets. Slightly longer training (5–6 epochs) can improve macro‑F1, capturing more minority labels with minimal micro‑F1 trade‑off.

---

## Model Examination 

- Inspect confidence/threshold curves per label to tune decision thresholds.
- Use error analysis on false negatives for long‑tail labels; consider reweighting or augmentation.

---

## Environmental Impact

- **Hardware Type:** Single laptop GPU (RTX 3080 Ti Laptop, 16 GB)
- **Hours used (example run):** ~0.36 hours
---

## Technical Specifications 

### Model Architecture and Objective

- **Architecture:** Decoder‑only Transformer (Llama 3.2 class), adapted via **LoRA**.
- **Objective:** Multi‑label classification formulated as conditional generation with sigmoid/thresholding for label decisions.

### Compute Infrastructure

#### Hardware

- Laptop with NVIDIA GeForce RTX 3080 Ti (laptop) GPU, 16 GB VRAM.

#### Software

- Python, PyTorch, Hugging Face Transformers, PEFT, (optional) bitsandbytes for 4‑bit.

---

## Citation 

If you use this work, please consider citing the following:

**BibTeX:**

```bibtex
@article{Hu2021LoRA,
  title={LoRA: Low-Rank Adaptation of Large Language Models},
  author={Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen},
  journal={arXiv preprint arXiv:2106.09685},
  year={2021}
}

@article{Dettmers2023QLoRA,
  title={QLoRA: Efficient Finetuning of Quantized LLMs},
  author={Tim Dettmers and Artidoro Pagnoni and Ari Holtzman and Luke Zettlemoyer},
  journal={arXiv preprint arXiv:2305.14314},
  year={2023}
}
```

**APA:**

- Hu, E. J., Shen, Y., Wallis, P., Allen‑Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021). *LoRA: Low‑Rank Adaptation of Large Language Models*. arXiv:2106.09685.
- Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). *QLoRA: Efficient Finetuning of Quantized LLMs*. arXiv:2305.14314.

---

## Glossary 

- **LoRA:** Low‑Rank Adaptation; injects small trainable matrices into a frozen backbone to adapt it efficiently.
- **QLoRA (4‑bit):** Finetuning with the backbone quantized to 4‑bit precision, training only LoRA adapters.
- **Micro‑/Macro‑F1:** Micro aggregates over all instances; Macro averages over classes equally (sensitive to minority classes).

---

## More Information 

- The repo ships a minimal CLI (`llm_cls/cli.py`) and example YAML config (`configs/default.yaml`) to reproduce results.
- For non‑Linux environments or if `bitsandbytes` is unavailable, disable 4‑bit and train in standard precision.

---

## Model Card Authors 

- **Author/Maintainer:** Amirhossein Yousefi (`amirhossein-yousefi` / `Amirhossein75`)

## Model Card Contact

- Open an issue in the GitHub repository or contact the Hugging Face user `Amirhossein75`.