Amirhossein75 commited on
Commit
7366eb9
·
verified ·
1 Parent(s): d76bb31

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +300 -0
README.md ADDED
@@ -0,0 +1,300 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
3
+ # Doc / guide: https://huggingface.co/docs/hub/model-cards
4
+ datasets:
5
+ - "pietrolesci/amazoncat-13k"
6
+ language:
7
+ - en
8
+ library_name: transformers
9
+ license: other # Base model (Meta Llama 3.2) is under the Llama 3.2 Community License
10
+ pipeline_tag: text-classification
11
+ tags:
12
+ - multi-label
13
+ - LoRA
14
+ - QLoRA
15
+ - bitsandbytes
16
+ - decoder-only
17
+ - llama-3.2-1b
18
+ - peft
19
+ - text-classification
20
+ base_model: meta-llama/Llama-3.2-1B
21
+ ---
22
+
23
+ # Model Card for Amirhossein75/LLM-Decoder-Tuning-Text-Classification
24
+
25
+ > **One‑line summary:** Decoder‑only LLMs (e.g., Llama‑3.2‑1B) fine‑tuned for **multi‑label text classification** using **LoRA** adapters, with optional **4‑bit QLoRA** quantization for memory‑efficient training and inference. A clean CLI and YAML config make it easy to reproduce results and swap backbones.
26
+
27
+ This model card accompanies the repository **LLM‑Decoder‑Tuning‑Text‑Classification** and documents a practical recipe for using decoder‑only LLMs as strong multi‑label classifiers with parameter‑efficient fine‑tuning (PEFT).
28
+
29
+ > **Note:** This card describes a *training pipeline + example checkpoints*. If you push a specific checkpoint to the Hub, please fill in exact dataset splits, metrics, and license at upload time.
30
+
31
+ ---
32
+
33
+ ## Model Details
34
+
35
+ ### Model Description
36
+
37
+ This project provides a **modular training & inference stack** for multi‑label text classification built on top of **Hugging Face Transformers** and **PEFT**. It adapts **decoder‑only** LLMs (tested with `meta-llama/Llama-3.2-1B`) using **LoRA** adapters, and optionally enables **4‑bit quantization** (QLoRA‑style) for reduced memory footprint during training and inference. The repository exposes a **single CLI** for train/eval/predict and a **YAML configuration** to control data paths, model choice, and hyperparameters.
38
+
39
+ - **Developed by:** Amirhossein Yousefi (GitHub: `amirhossein-yousefi`; Hugging Face: `Amirhossein75`)
40
+ - **Model type:** Decoder‑only causal LM with PEFT (LoRA) for multi‑label classification
41
+ - **Language(s):** English (evaluated on AmazonCat‑13K subset)
42
+ - **License:** The **base model** (`meta-llama/Llama-3.2-1B`) is under the **Llama 3.2 Community License**. The LoRA adapter you publish should declare its own license and acknowledge base‑model terms.
43
+ - **Finetuned from:** `meta-llama/Llama-3.2-1B` (foundation)
44
+
45
+ ### Model Sources
46
+
47
+ - **Repository:** https://github.com/amirhossein-yousefi/LLM-Decoder-Tuning-Text-Classification
48
+ - **Model (Hub placeholder):** https://huggingface.co/Amirhossein75/LLM-Decoder-Tuning-Text-Classification
49
+ - **Background reading:**
50
+ - LoRA: Low‑Rank Adaptation of Large Language Models (Hu et al., 2021)
51
+ - QLoRA: Efficient Finetuning of Quantized LLMs (Dettmers et al., 2023)
52
+ - PEFT documentation (Hugging Face)
53
+
54
+ ---
55
+
56
+ ## Uses
57
+
58
+ ### Direct Use
59
+
60
+ - **Multi‑label text classification** on English corpora (e.g., product tagging, topic tagging, content routing).
61
+ - Inference via:
62
+ - Provided **CLI** (`python -m llm_cls.cli predict --config ...`) producing JSONL predictions.
63
+ - Hugging Face pipelines with base model + LoRA adapter loaded (see “How to Get Started”).
64
+
65
+ ### Downstream Use
66
+
67
+ - **Domain transfer:** Re‑train on your domain labels by pointing the YAML to your CSVs.
68
+ - **Backbone swap:** Replace `model.model_name` in the config to try other decoders or encoders (set `use_4bit=false` for encoders).
69
+
70
+ ### Out‑of‑Scope Use
71
+
72
+ - Safety‑critical decisions without human oversight.
73
+ - Tasks requiring **extreme multilabel** scaling (e.g., hundreds of thousands of labels) without additional adaptation.
74
+ - Non‑English or code‑mixed data without validation.
75
+ - Any use that conflicts with the base model’s license and acceptable‑use policies.
76
+
77
+ ---
78
+
79
+ ## Bias, Risks, and Limitations
80
+
81
+ - **Dataset bias:** AmazonCat‑13K originates from product data; labels and text reflect marketplace distributions and may encode demographic or topical biases.
82
+ - **Multi‑label long tail:** Minority classes are harder; macro‑F1 often trails micro‑F1. Consider class weighting, augmentation, or threshold tuning.
83
+ - **Decoder framing:** Treating classification as generation can be sensitive to prompt/format and decoding thresholds.
84
+ - **License & usage constraints:** Ensure compliance with the Llama 3.2 Community License for derivatives and deployment.
85
+
86
+ ### Recommendations
87
+
88
+ - Track **micro‑ and macro‑F1** and per‑class metrics.
89
+ - Use **threshold tuning** on validation to balance precision/recall per class.
90
+ - For memory‑constrained environments, prefer **4‑bit + LoRA**; otherwise disable 4‑bit on platforms without `bitsandbytes` support.
91
+
92
+ ---
93
+
94
+ ## How to Get Started with the Model
95
+
96
+ Below is an example of loading a base Llama model with a LoRA adapter for classification‑style inference. Replace `BASE_MODEL` and `ADAPTER_REPO` with your IDs.
97
+
98
+ ```python
99
+ from transformers import AutoTokenizer, AutoModelForCausalLM, TextGenerationPipeline
100
+ from peft import PeftModel
101
+ import torch
102
+
103
+ BASE_MODEL = "meta-llama/Llama-3.2-1B"
104
+ ADAPTER_REPO = "Amirhossein75/LLM-Decoder-Tuning-Text-Classification" # or your own adapter
105
+
106
+ tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, use_fast=True)
107
+ base = AutoModelForCausalLM.from_pretrained(
108
+ BASE_MODEL,
109
+ torch_dtype=torch.bfloat16,
110
+ device_map="auto",
111
+ )
112
+ model = PeftModel.from_pretrained(base, ADAPTER_REPO)
113
+ model.eval()
114
+
115
+ # Simple prompt format for multi-label classification (adjust to your training format).
116
+ labels = ["books","movies_tv","music","pop","literature_fiction","movies","education_reference","rock","used_rental_textbooks","new"]
117
+ text = "A thrilling space opera with deep character arcs and rich world-building."
118
+
119
+ prompt = (
120
+ "You are a classifier. Given the text, return a JSON list of applicable labels from this set: "
121
+ + ", ".join(labels) + ".\n"
122
+ + f"Text: {text}\nLabels: "
123
+ )
124
+
125
+ pipe = TextGenerationPipeline(model=model, tokenizer=tokenizer, device=0 if torch.cuda.is_available() else -1)
126
+ out = pipe(prompt, max_new_tokens=64, do_sample=False)
127
+ print(out[0]["generated_text"])
128
+ ```
129
+
130
+ For **CLI usage**:
131
+
132
+ ```bash
133
+ # Train
134
+ python -m llm_cls.cli train --config configs/default.yaml
135
+
136
+ # Predict
137
+ python -m llm_cls.cli predict --config configs/default.yaml --input_csv data/test.csv --output_jsonl preds.jsonl
138
+ ```
139
+
140
+ ---
141
+
142
+ ## Training Details
143
+
144
+ ### Training Data
145
+
146
+ - **Dataset:** AmazonCat‑13K (example subset; top‑10 categories for illustration). If you use the full dataset, update CSV paths and label columns accordingly.
147
+ - **Format:** CSV with at least a text column and one or more label columns (multi‑label). Configure names in `configs/default.yaml`.
148
+ - **Splits:** Train / Validation / (Optional) Test; sample scripts are provided to create CSV splits.
149
+
150
+ ### Training Procedure
151
+
152
+ #### Preprocessing
153
+
154
+ - Tokenization with the base model’s tokenizer.
155
+ - Optional script to prepare AmazonCat‑13K CSVs (see `split_amazon_13k_data.py` in the repo).
156
+
157
+ #### Training Hyperparameters (illustrative config)
158
+
159
+ - **Base model:** `meta-llama/Llama-3.2-1B`
160
+ - **Problem type:** `multi_label_classification`
161
+ - **Precision / quantization:** `use_4bit: true` (QLoRA‑style); `torch_dtype: bfloat16` for computation
162
+ - **LoRA:** `r=2`, `alpha=2`, `dropout=0.05`
163
+ - **LoRA target modules:** `["q_proj","k_proj","v_proj","o_proj","gate_proj","down_proj","up_proj"]`
164
+ - **Batch size:** `4` (with `gradient_accumulation_steps=8`)
165
+ - **Max length:** `1024`
166
+ - **Optimizer:** 8‑bit optimizer when quantized (`optim_8bit_when_4bit: true`)
167
+ - **Epochs:** up to `20` with early stopping (`patience=2`)
168
+ - **Metric for best model:** `f1_micro`
169
+
170
+ #### Speeds, Sizes, Times (example run)
171
+
172
+ - **Device:** NVIDIA GeForce RTX 3080 Ti Laptop GPU (16 GB VRAM)
173
+ - **Runtime:** ~1,310 seconds for the best run
174
+ - **Throughput:** ≈0.784 steps/s (≈24.9 samples/s) during training
175
+ - **Artifacts:** Reproducible outputs under `outputs/<model_name>/<dataset_name>/run_<i>/`
176
+
177
+ ---
178
+
179
+ ## Evaluation
180
+
181
+ ### Testing Data, Factors & Metrics
182
+
183
+ - **Testing data:** Held‑out split from AmazonCat‑13K (example subset).
184
+ - **Factors:** Evaluate both **micro‑F1** (overall) and **macro‑F1** (per‑class average) to reflect long‑tail performance.
185
+ - **Metrics:** `f1_micro`, `f1_macro`, eval loss, throughput (steps/s, samples/s).
186
+ ### Metrics
187
+
188
+ - **Best overall (micro-F1):** **0.830** at **5 epochs**
189
+ - **Best minority‑class sensitivity (macro-F1):** **0.752** at **6 epochs**
190
+ - **Average across 4 runs:** micro‑F1 **0.824**, macro‑F1 **0.741**, eval loss **0.161**
191
+ - **Throughput:** train ≈ **0.784 steps/s** (**24.9 samples/s**) ; eval time ≈ **34.0s** per run.
192
+
193
+ > Interpretation: going from **4 → 5 epochs** gives the best **micro‑F1**; **6 epochs** squeezes out the top **macro‑F1**, hinting at slightly better coverage of minority classes with a tiny trade‑off in micro‑F1.
194
+
195
+ ---
196
+ ### 📈 Per‑run metrics
197
+ | Run | Epochs | Train Loss | Eval Loss | F1 (micro) | F1 (macro) | Train Time (s) | Train steps/s | Train samples/s | Eval Time (s) |
198
+ |---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|
199
+ | 1 | 4 | 1.400 | 0.157 | 0.824 | 0.738 | 1309.6 | 0.962 | 30.543 | 33.6 |
200
+ | 2 | 5 | 1.220 | 0.159 | 0.830 | 0.743 | 1640.3 | 0.768 | 24.385 | 34.0 |
201
+ | 3 | 6 | 1.063 | 0.162 | 0.826 | 0.752 | 1984.2 | 0.635 | 20.159 | 34.4 |
202
+ | 4 | 5 | 1.265 | 0.165 | 0.816 | 0.729 | 1639.3 | 0.769 | 24.401 | 34.0 |
203
+
204
+ <sub>*F1(micro)* aggregates decisions over all samples; *F1(macro)* averages per‑class F1 equally, highlighting minority‑class performance.</sub>
205
+
206
+ ### Results (example)
207
+
208
+ - **Best micro‑F1:** `0.830` at 5 epochs
209
+ - **Best macro‑F1:** `0.752` at 6 epochs
210
+ - **Average across 4 runs:** micro‑F1 `0.824`, macro‑F1 `0.741`, eval loss `0.161`
211
+
212
+ #### Summary
213
+
214
+ Decoder‑only LLMs with **LoRA** adapters provide competitive multi‑label performance with small memory/compute budgets. Slightly longer training (5–6 epochs) can improve macro‑F1, capturing more minority labels with minimal micro‑F1 trade‑off.
215
+
216
+ ---
217
+
218
+ ## Model Examination
219
+
220
+ - Inspect confidence/threshold curves per label to tune decision thresholds.
221
+ - Use error analysis on false negatives for long‑tail labels; consider reweighting or augmentation.
222
+
223
+ ---
224
+
225
+ ## Environmental Impact
226
+
227
+ - **Hardware Type:** Single laptop GPU (RTX 3080 Ti Laptop, 16 GB)
228
+ - **Hours used (example run):** ~0.36 hours
229
+ ---
230
+
231
+ ## Technical Specifications
232
+
233
+ ### Model Architecture and Objective
234
+
235
+ - **Architecture:** Decoder‑only Transformer (Llama 3.2 class), adapted via **LoRA**.
236
+ - **Objective:** Multi‑label classification formulated as conditional generation with sigmoid/thresholding for label decisions.
237
+
238
+ ### Compute Infrastructure
239
+
240
+ #### Hardware
241
+
242
+ - Laptop with NVIDIA GeForce RTX 3080 Ti (laptop) GPU, 16 GB VRAM.
243
+
244
+ #### Software
245
+
246
+ - Python, PyTorch, Hugging Face Transformers, PEFT, (optional) bitsandbytes for 4‑bit.
247
+
248
+ ---
249
+
250
+ ## Citation
251
+
252
+ If you use this work, please consider citing the following:
253
+
254
+ **BibTeX:**
255
+
256
+ ```bibtex
257
+ @article{Hu2021LoRA,
258
+ title={LoRA: Low-Rank Adaptation of Large Language Models},
259
+ author={Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen},
260
+ journal={arXiv preprint arXiv:2106.09685},
261
+ year={2021}
262
+ }
263
+
264
+ @article{Dettmers2023QLoRA,
265
+ title={QLoRA: Efficient Finetuning of Quantized LLMs},
266
+ author={Tim Dettmers and Artidoro Pagnoni and Ari Holtzman and Luke Zettlemoyer},
267
+ journal={arXiv preprint arXiv:2305.14314},
268
+ year={2023}
269
+ }
270
+ ```
271
+
272
+ **APA:**
273
+
274
+ - Hu, E. J., Shen, Y., Wallis, P., Allen‑Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021). *LoRA: Low‑Rank Adaptation of Large Language Models*. arXiv:2106.09685.
275
+ - Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). *QLoRA: Efficient Finetuning of Quantized LLMs*. arXiv:2305.14314.
276
+
277
+ ---
278
+
279
+ ## Glossary
280
+
281
+ - **LoRA:** Low‑Rank Adaptation; injects small trainable matrices into a frozen backbone to adapt it efficiently.
282
+ - **QLoRA (4‑bit):** Finetuning with the backbone quantized to 4‑bit precision, training only LoRA adapters.
283
+ - **Micro‑/Macro‑F1:** Micro aggregates over all instances; Macro averages over classes equally (sensitive to minority classes).
284
+
285
+ ---
286
+
287
+ ## More Information
288
+
289
+ - The repo ships a minimal CLI (`llm_cls/cli.py`) and example YAML config (`configs/default.yaml`) to reproduce results.
290
+ - For non‑Linux environments or if `bitsandbytes` is unavailable, disable 4‑bit and train in standard precision.
291
+
292
+ ---
293
+
294
+ ## Model Card Authors
295
+
296
+ - **Author/Maintainer:** Amirhossein Yousefi (`amirhossein-yousefi` / `Amirhossein75`)
297
+
298
+ ## Model Card Contact
299
+
300
+ - Open an issue in the GitHub repository or contact the Hugging Face user `Amirhossein75`.