Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,300 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
|
3 |
+
# Doc / guide: https://huggingface.co/docs/hub/model-cards
|
4 |
+
datasets:
|
5 |
+
- "pietrolesci/amazoncat-13k"
|
6 |
+
language:
|
7 |
+
- en
|
8 |
+
library_name: transformers
|
9 |
+
license: other # Base model (Meta Llama 3.2) is under the Llama 3.2 Community License
|
10 |
+
pipeline_tag: text-classification
|
11 |
+
tags:
|
12 |
+
- multi-label
|
13 |
+
- LoRA
|
14 |
+
- QLoRA
|
15 |
+
- bitsandbytes
|
16 |
+
- decoder-only
|
17 |
+
- llama-3.2-1b
|
18 |
+
- peft
|
19 |
+
- text-classification
|
20 |
+
base_model: meta-llama/Llama-3.2-1B
|
21 |
+
---
|
22 |
+
|
23 |
+
# Model Card for Amirhossein75/LLM-Decoder-Tuning-Text-Classification
|
24 |
+
|
25 |
+
> **One‑line summary:** Decoder‑only LLMs (e.g., Llama‑3.2‑1B) fine‑tuned for **multi‑label text classification** using **LoRA** adapters, with optional **4‑bit QLoRA** quantization for memory‑efficient training and inference. A clean CLI and YAML config make it easy to reproduce results and swap backbones.
|
26 |
+
|
27 |
+
This model card accompanies the repository **LLM‑Decoder‑Tuning‑Text‑Classification** and documents a practical recipe for using decoder‑only LLMs as strong multi‑label classifiers with parameter‑efficient fine‑tuning (PEFT).
|
28 |
+
|
29 |
+
> **Note:** This card describes a *training pipeline + example checkpoints*. If you push a specific checkpoint to the Hub, please fill in exact dataset splits, metrics, and license at upload time.
|
30 |
+
|
31 |
+
---
|
32 |
+
|
33 |
+
## Model Details
|
34 |
+
|
35 |
+
### Model Description
|
36 |
+
|
37 |
+
This project provides a **modular training & inference stack** for multi‑label text classification built on top of **Hugging Face Transformers** and **PEFT**. It adapts **decoder‑only** LLMs (tested with `meta-llama/Llama-3.2-1B`) using **LoRA** adapters, and optionally enables **4‑bit quantization** (QLoRA‑style) for reduced memory footprint during training and inference. The repository exposes a **single CLI** for train/eval/predict and a **YAML configuration** to control data paths, model choice, and hyperparameters.
|
38 |
+
|
39 |
+
- **Developed by:** Amirhossein Yousefi (GitHub: `amirhossein-yousefi`; Hugging Face: `Amirhossein75`)
|
40 |
+
- **Model type:** Decoder‑only causal LM with PEFT (LoRA) for multi‑label classification
|
41 |
+
- **Language(s):** English (evaluated on AmazonCat‑13K subset)
|
42 |
+
- **License:** The **base model** (`meta-llama/Llama-3.2-1B`) is under the **Llama 3.2 Community License**. The LoRA adapter you publish should declare its own license and acknowledge base‑model terms.
|
43 |
+
- **Finetuned from:** `meta-llama/Llama-3.2-1B` (foundation)
|
44 |
+
|
45 |
+
### Model Sources
|
46 |
+
|
47 |
+
- **Repository:** https://github.com/amirhossein-yousefi/LLM-Decoder-Tuning-Text-Classification
|
48 |
+
- **Model (Hub placeholder):** https://huggingface.co/Amirhossein75/LLM-Decoder-Tuning-Text-Classification
|
49 |
+
- **Background reading:**
|
50 |
+
- LoRA: Low‑Rank Adaptation of Large Language Models (Hu et al., 2021)
|
51 |
+
- QLoRA: Efficient Finetuning of Quantized LLMs (Dettmers et al., 2023)
|
52 |
+
- PEFT documentation (Hugging Face)
|
53 |
+
|
54 |
+
---
|
55 |
+
|
56 |
+
## Uses
|
57 |
+
|
58 |
+
### Direct Use
|
59 |
+
|
60 |
+
- **Multi‑label text classification** on English corpora (e.g., product tagging, topic tagging, content routing).
|
61 |
+
- Inference via:
|
62 |
+
- Provided **CLI** (`python -m llm_cls.cli predict --config ...`) producing JSONL predictions.
|
63 |
+
- Hugging Face pipelines with base model + LoRA adapter loaded (see “How to Get Started”).
|
64 |
+
|
65 |
+
### Downstream Use
|
66 |
+
|
67 |
+
- **Domain transfer:** Re‑train on your domain labels by pointing the YAML to your CSVs.
|
68 |
+
- **Backbone swap:** Replace `model.model_name` in the config to try other decoders or encoders (set `use_4bit=false` for encoders).
|
69 |
+
|
70 |
+
### Out‑of‑Scope Use
|
71 |
+
|
72 |
+
- Safety‑critical decisions without human oversight.
|
73 |
+
- Tasks requiring **extreme multilabel** scaling (e.g., hundreds of thousands of labels) without additional adaptation.
|
74 |
+
- Non‑English or code‑mixed data without validation.
|
75 |
+
- Any use that conflicts with the base model’s license and acceptable‑use policies.
|
76 |
+
|
77 |
+
---
|
78 |
+
|
79 |
+
## Bias, Risks, and Limitations
|
80 |
+
|
81 |
+
- **Dataset bias:** AmazonCat‑13K originates from product data; labels and text reflect marketplace distributions and may encode demographic or topical biases.
|
82 |
+
- **Multi‑label long tail:** Minority classes are harder; macro‑F1 often trails micro‑F1. Consider class weighting, augmentation, or threshold tuning.
|
83 |
+
- **Decoder framing:** Treating classification as generation can be sensitive to prompt/format and decoding thresholds.
|
84 |
+
- **License & usage constraints:** Ensure compliance with the Llama 3.2 Community License for derivatives and deployment.
|
85 |
+
|
86 |
+
### Recommendations
|
87 |
+
|
88 |
+
- Track **micro‑ and macro‑F1** and per‑class metrics.
|
89 |
+
- Use **threshold tuning** on validation to balance precision/recall per class.
|
90 |
+
- For memory‑constrained environments, prefer **4‑bit + LoRA**; otherwise disable 4‑bit on platforms without `bitsandbytes` support.
|
91 |
+
|
92 |
+
---
|
93 |
+
|
94 |
+
## How to Get Started with the Model
|
95 |
+
|
96 |
+
Below is an example of loading a base Llama model with a LoRA adapter for classification‑style inference. Replace `BASE_MODEL` and `ADAPTER_REPO` with your IDs.
|
97 |
+
|
98 |
+
```python
|
99 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM, TextGenerationPipeline
|
100 |
+
from peft import PeftModel
|
101 |
+
import torch
|
102 |
+
|
103 |
+
BASE_MODEL = "meta-llama/Llama-3.2-1B"
|
104 |
+
ADAPTER_REPO = "Amirhossein75/LLM-Decoder-Tuning-Text-Classification" # or your own adapter
|
105 |
+
|
106 |
+
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, use_fast=True)
|
107 |
+
base = AutoModelForCausalLM.from_pretrained(
|
108 |
+
BASE_MODEL,
|
109 |
+
torch_dtype=torch.bfloat16,
|
110 |
+
device_map="auto",
|
111 |
+
)
|
112 |
+
model = PeftModel.from_pretrained(base, ADAPTER_REPO)
|
113 |
+
model.eval()
|
114 |
+
|
115 |
+
# Simple prompt format for multi-label classification (adjust to your training format).
|
116 |
+
labels = ["books","movies_tv","music","pop","literature_fiction","movies","education_reference","rock","used_rental_textbooks","new"]
|
117 |
+
text = "A thrilling space opera with deep character arcs and rich world-building."
|
118 |
+
|
119 |
+
prompt = (
|
120 |
+
"You are a classifier. Given the text, return a JSON list of applicable labels from this set: "
|
121 |
+
+ ", ".join(labels) + ".\n"
|
122 |
+
+ f"Text: {text}\nLabels: "
|
123 |
+
)
|
124 |
+
|
125 |
+
pipe = TextGenerationPipeline(model=model, tokenizer=tokenizer, device=0 if torch.cuda.is_available() else -1)
|
126 |
+
out = pipe(prompt, max_new_tokens=64, do_sample=False)
|
127 |
+
print(out[0]["generated_text"])
|
128 |
+
```
|
129 |
+
|
130 |
+
For **CLI usage**:
|
131 |
+
|
132 |
+
```bash
|
133 |
+
# Train
|
134 |
+
python -m llm_cls.cli train --config configs/default.yaml
|
135 |
+
|
136 |
+
# Predict
|
137 |
+
python -m llm_cls.cli predict --config configs/default.yaml --input_csv data/test.csv --output_jsonl preds.jsonl
|
138 |
+
```
|
139 |
+
|
140 |
+
---
|
141 |
+
|
142 |
+
## Training Details
|
143 |
+
|
144 |
+
### Training Data
|
145 |
+
|
146 |
+
- **Dataset:** AmazonCat‑13K (example subset; top‑10 categories for illustration). If you use the full dataset, update CSV paths and label columns accordingly.
|
147 |
+
- **Format:** CSV with at least a text column and one or more label columns (multi‑label). Configure names in `configs/default.yaml`.
|
148 |
+
- **Splits:** Train / Validation / (Optional) Test; sample scripts are provided to create CSV splits.
|
149 |
+
|
150 |
+
### Training Procedure
|
151 |
+
|
152 |
+
#### Preprocessing
|
153 |
+
|
154 |
+
- Tokenization with the base model’s tokenizer.
|
155 |
+
- Optional script to prepare AmazonCat‑13K CSVs (see `split_amazon_13k_data.py` in the repo).
|
156 |
+
|
157 |
+
#### Training Hyperparameters (illustrative config)
|
158 |
+
|
159 |
+
- **Base model:** `meta-llama/Llama-3.2-1B`
|
160 |
+
- **Problem type:** `multi_label_classification`
|
161 |
+
- **Precision / quantization:** `use_4bit: true` (QLoRA‑style); `torch_dtype: bfloat16` for computation
|
162 |
+
- **LoRA:** `r=2`, `alpha=2`, `dropout=0.05`
|
163 |
+
- **LoRA target modules:** `["q_proj","k_proj","v_proj","o_proj","gate_proj","down_proj","up_proj"]`
|
164 |
+
- **Batch size:** `4` (with `gradient_accumulation_steps=8`)
|
165 |
+
- **Max length:** `1024`
|
166 |
+
- **Optimizer:** 8‑bit optimizer when quantized (`optim_8bit_when_4bit: true`)
|
167 |
+
- **Epochs:** up to `20` with early stopping (`patience=2`)
|
168 |
+
- **Metric for best model:** `f1_micro`
|
169 |
+
|
170 |
+
#### Speeds, Sizes, Times (example run)
|
171 |
+
|
172 |
+
- **Device:** NVIDIA GeForce RTX 3080 Ti Laptop GPU (16 GB VRAM)
|
173 |
+
- **Runtime:** ~1,310 seconds for the best run
|
174 |
+
- **Throughput:** ≈0.784 steps/s (≈24.9 samples/s) during training
|
175 |
+
- **Artifacts:** Reproducible outputs under `outputs/<model_name>/<dataset_name>/run_<i>/`
|
176 |
+
|
177 |
+
---
|
178 |
+
|
179 |
+
## Evaluation
|
180 |
+
|
181 |
+
### Testing Data, Factors & Metrics
|
182 |
+
|
183 |
+
- **Testing data:** Held‑out split from AmazonCat‑13K (example subset).
|
184 |
+
- **Factors:** Evaluate both **micro‑F1** (overall) and **macro‑F1** (per‑class average) to reflect long‑tail performance.
|
185 |
+
- **Metrics:** `f1_micro`, `f1_macro`, eval loss, throughput (steps/s, samples/s).
|
186 |
+
### Metrics
|
187 |
+
|
188 |
+
- **Best overall (micro-F1):** **0.830** at **5 epochs**
|
189 |
+
- **Best minority‑class sensitivity (macro-F1):** **0.752** at **6 epochs**
|
190 |
+
- **Average across 4 runs:** micro‑F1 **0.824**, macro‑F1 **0.741**, eval loss **0.161**
|
191 |
+
- **Throughput:** train ≈ **0.784 steps/s** (**24.9 samples/s**) ; eval time ≈ **34.0s** per run.
|
192 |
+
|
193 |
+
> Interpretation: going from **4 → 5 epochs** gives the best **micro‑F1**; **6 epochs** squeezes out the top **macro‑F1**, hinting at slightly better coverage of minority classes with a tiny trade‑off in micro‑F1.
|
194 |
+
|
195 |
+
---
|
196 |
+
### 📈 Per‑run metrics
|
197 |
+
| Run | Epochs | Train Loss | Eval Loss | F1 (micro) | F1 (macro) | Train Time (s) | Train steps/s | Train samples/s | Eval Time (s) |
|
198 |
+
|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
|
199 |
+
| 1 | 4 | 1.400 | 0.157 | 0.824 | 0.738 | 1309.6 | 0.962 | 30.543 | 33.6 |
|
200 |
+
| 2 | 5 | 1.220 | 0.159 | 0.830 | 0.743 | 1640.3 | 0.768 | 24.385 | 34.0 |
|
201 |
+
| 3 | 6 | 1.063 | 0.162 | 0.826 | 0.752 | 1984.2 | 0.635 | 20.159 | 34.4 |
|
202 |
+
| 4 | 5 | 1.265 | 0.165 | 0.816 | 0.729 | 1639.3 | 0.769 | 24.401 | 34.0 |
|
203 |
+
|
204 |
+
<sub>*F1(micro)* aggregates decisions over all samples; *F1(macro)* averages per‑class F1 equally, highlighting minority‑class performance.</sub>
|
205 |
+
|
206 |
+
### Results (example)
|
207 |
+
|
208 |
+
- **Best micro‑F1:** `0.830` at 5 epochs
|
209 |
+
- **Best macro‑F1:** `0.752` at 6 epochs
|
210 |
+
- **Average across 4 runs:** micro‑F1 `0.824`, macro‑F1 `0.741`, eval loss `0.161`
|
211 |
+
|
212 |
+
#### Summary
|
213 |
+
|
214 |
+
Decoder‑only LLMs with **LoRA** adapters provide competitive multi‑label performance with small memory/compute budgets. Slightly longer training (5–6 epochs) can improve macro‑F1, capturing more minority labels with minimal micro‑F1 trade‑off.
|
215 |
+
|
216 |
+
---
|
217 |
+
|
218 |
+
## Model Examination
|
219 |
+
|
220 |
+
- Inspect confidence/threshold curves per label to tune decision thresholds.
|
221 |
+
- Use error analysis on false negatives for long‑tail labels; consider reweighting or augmentation.
|
222 |
+
|
223 |
+
---
|
224 |
+
|
225 |
+
## Environmental Impact
|
226 |
+
|
227 |
+
- **Hardware Type:** Single laptop GPU (RTX 3080 Ti Laptop, 16 GB)
|
228 |
+
- **Hours used (example run):** ~0.36 hours
|
229 |
+
---
|
230 |
+
|
231 |
+
## Technical Specifications
|
232 |
+
|
233 |
+
### Model Architecture and Objective
|
234 |
+
|
235 |
+
- **Architecture:** Decoder‑only Transformer (Llama 3.2 class), adapted via **LoRA**.
|
236 |
+
- **Objective:** Multi‑label classification formulated as conditional generation with sigmoid/thresholding for label decisions.
|
237 |
+
|
238 |
+
### Compute Infrastructure
|
239 |
+
|
240 |
+
#### Hardware
|
241 |
+
|
242 |
+
- Laptop with NVIDIA GeForce RTX 3080 Ti (laptop) GPU, 16 GB VRAM.
|
243 |
+
|
244 |
+
#### Software
|
245 |
+
|
246 |
+
- Python, PyTorch, Hugging Face Transformers, PEFT, (optional) bitsandbytes for 4‑bit.
|
247 |
+
|
248 |
+
---
|
249 |
+
|
250 |
+
## Citation
|
251 |
+
|
252 |
+
If you use this work, please consider citing the following:
|
253 |
+
|
254 |
+
**BibTeX:**
|
255 |
+
|
256 |
+
```bibtex
|
257 |
+
@article{Hu2021LoRA,
|
258 |
+
title={LoRA: Low-Rank Adaptation of Large Language Models},
|
259 |
+
author={Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen},
|
260 |
+
journal={arXiv preprint arXiv:2106.09685},
|
261 |
+
year={2021}
|
262 |
+
}
|
263 |
+
|
264 |
+
@article{Dettmers2023QLoRA,
|
265 |
+
title={QLoRA: Efficient Finetuning of Quantized LLMs},
|
266 |
+
author={Tim Dettmers and Artidoro Pagnoni and Ari Holtzman and Luke Zettlemoyer},
|
267 |
+
journal={arXiv preprint arXiv:2305.14314},
|
268 |
+
year={2023}
|
269 |
+
}
|
270 |
+
```
|
271 |
+
|
272 |
+
**APA:**
|
273 |
+
|
274 |
+
- Hu, E. J., Shen, Y., Wallis, P., Allen‑Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021). *LoRA: Low‑Rank Adaptation of Large Language Models*. arXiv:2106.09685.
|
275 |
+
- Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). *QLoRA: Efficient Finetuning of Quantized LLMs*. arXiv:2305.14314.
|
276 |
+
|
277 |
+
---
|
278 |
+
|
279 |
+
## Glossary
|
280 |
+
|
281 |
+
- **LoRA:** Low‑Rank Adaptation; injects small trainable matrices into a frozen backbone to adapt it efficiently.
|
282 |
+
- **QLoRA (4‑bit):** Finetuning with the backbone quantized to 4‑bit precision, training only LoRA adapters.
|
283 |
+
- **Micro‑/Macro‑F1:** Micro aggregates over all instances; Macro averages over classes equally (sensitive to minority classes).
|
284 |
+
|
285 |
+
---
|
286 |
+
|
287 |
+
## More Information
|
288 |
+
|
289 |
+
- The repo ships a minimal CLI (`llm_cls/cli.py`) and example YAML config (`configs/default.yaml`) to reproduce results.
|
290 |
+
- For non‑Linux environments or if `bitsandbytes` is unavailable, disable 4‑bit and train in standard precision.
|
291 |
+
|
292 |
+
---
|
293 |
+
|
294 |
+
## Model Card Authors
|
295 |
+
|
296 |
+
- **Author/Maintainer:** Amirhossein Yousefi (`amirhossein-yousefi` / `Amirhossein75`)
|
297 |
+
|
298 |
+
## Model Card Contact
|
299 |
+
|
300 |
+
- Open an issue in the GitHub repository or contact the Hugging Face user `Amirhossein75`.
|