updated readme
Browse files
README.md
CHANGED
@@ -19,21 +19,14 @@ language:
|
|
19 |
A conversational LLM for summarizing phone specifications into concise, appealing descriptions for e-commerce.
|
20 |
**Model:** LoRA fine-tuned Llama-3.2
|
21 |
**Repo:** [`masabhuq/stl_phone_summarizer`](https://huggingface.co/masabhuq/stl_phone_summarizer)
|
22 |
-
|
23 |
---
|
24 |
-
|
25 |
## Installation
|
26 |
-
|
27 |
```bash
|
28 |
pip install unsloth torch
|
29 |
```
|
30 |
-
|
31 |
---
|
32 |
-
|
33 |
## Usage
|
34 |
-
|
35 |
### 1. Load Model and Tokenizer
|
36 |
-
|
37 |
```python
|
38 |
from unsloth import FastLanguageModel
|
39 |
from unsloth.chat_templates import get_chat_template
|
@@ -46,9 +39,7 @@ model, tokenizer = FastLanguageModel.from_pretrained(
|
|
46 |
)
|
47 |
FastLanguageModel.for_inference(model)
|
48 |
```
|
49 |
-
|
50 |
### 2. Apply the Chat Template
|
51 |
-
|
52 |
```python
|
53 |
tokenizer = get_chat_template(
|
54 |
tokenizer,
|
@@ -56,9 +47,7 @@ tokenizer = get_chat_template(
|
|
56 |
map_eos_token=True,
|
57 |
)
|
58 |
```
|
59 |
-
|
60 |
### 3. Prepare the Input
|
61 |
-
|
62 |
```python
|
63 |
system_prompt = (
|
64 |
"You are an expert at summarizing phone specifications into short, appealing key descriptions for an e-commerce site. "
|
@@ -87,9 +76,7 @@ formatted_prompt = tokenizer.apply_chat_template(
|
|
87 |
add_generation_prompt=True,
|
88 |
)
|
89 |
```
|
90 |
-
|
91 |
### 4. Tokenize and Generate
|
92 |
-
|
93 |
```python
|
94 |
import torch
|
95 |
|
@@ -102,9 +89,7 @@ outputs = model.generate(
|
|
102 |
top_p=0.9,
|
103 |
)
|
104 |
```
|
105 |
-
|
106 |
### 5. Post-process Output
|
107 |
-
|
108 |
```python
|
109 |
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=False)
|
110 |
# Extract the last paragraph and clean up
|
@@ -114,24 +99,17 @@ clean_last_paragraph = last_paragraph.split("<|eot_id|>")[0].strip()
|
|
114 |
print(clean_last_paragraph)
|
115 |
```
|
116 |
### 6. Clean Up
|
117 |
-
|
118 |
Free GPU memory after inference:
|
119 |
-
|
120 |
```python
|
121 |
model.cpu()
|
122 |
torch.cuda.empty_cache()
|
123 |
```
|
124 |
-
|
125 |
---
|
126 |
## Hardware Requirements
|
127 |
-
|
128 |
- **GPU**: CUDA-compatible GPU with ~4-6GB VRAM for 4-bit inference.
|
129 |
- **CPU**: Optional for offloading model after inference (`model.cpu()`).
|
130 |
- **RAM**: ~8GB system RAM for smooth operation with dataset processing.
|
131 |
-
|
132 |
-
|
133 |
---
|
134 |
-
|
135 |
## Notes
|
136 |
|
137 |
- **Chat Template:** The tokenizer is uploaded without a chat template. Always apply the template at runtime as shown above.
|
@@ -139,21 +117,16 @@ torch.cuda.empty_cache()
|
|
139 |
- **Output Format:** The model is trained to output in a strict format for easy parsing.
|
140 |
- **Memory Management**: Use `model.cpu()` and `torch.cuda.empty_cache()` to free GPU memory after inference, especially on low-VRAM GPUs.
|
141 |
- **Inference Parameters**: Adjust `temperature` and `top_p` for more or less creative outputs, and `max_new_tokens` for longer or shorter summaries.
|
142 |
-
|
143 |
---
|
144 |
## Model Details
|
145 |
-
|
146 |
- **Base Model**: `unsloth/Llama-3.2-3B-Instruct-bnb-4bit`
|
147 |
- **Fine-Tuning**: LoRA adapters with rank `r=16`, targeting modules: `["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]`.
|
148 |
- **Quantization**: 4-bit for memory efficiency (~4-6GB VRAM).
|
149 |
- **Training Data**: A dataset of phone specifications (`specs`) paired with concise summaries (`output`) in the format shown above.
|
150 |
- **Training Setup**: Fine-tuned with `trl.SFTTrainer`, `train_on_responses_only` to focus on assistant responses, and Llama-3.2 chat template for single-turn interactions.
|
151 |
- **Output Constraints**: Summaries are limited to 280 characters, focusing on user-friendly features and avoiding technical terms like "IP68" or "IPDC".
|
152 |
-
|
153 |
---
|
154 |
-
|
155 |
## Dataset
|
156 |
-
|
157 |
The model was trained on a custom dataset (`specs_list.json`) containing pairs of detailed phone specifications and their corresponding summaries. Each entry includes:
|
158 |
- `specs`: Detailed technical specs (e.g., display size, chipset, camera details).
|
159 |
- `output`: A concise summary in the format:
|
@@ -165,19 +138,15 @@ The model was trained on a custom dataset (`specs_list.json`) containing pairs o
|
|
165 |
Others: [features]
|
166 |
```
|
167 |
The dataset emphasizes consumer-friendly features like high refresh rates, fast charging, and water resistance, avoiding overly technical terms.
|
168 |
-
|
169 |
---
|
170 |
|
171 |
## License
|
172 |
-
|
173 |
This model is licensed under the [Apache 2.0 License](LICENSE). See the `LICENSE` file in the repository for details.
|
174 |
|
175 |
---
|
176 |
|
177 |
## Citation
|
178 |
-
|
179 |
If you use this model, please cite the repository:
|
180 |
-
|
181 |
```bibtex
|
182 |
@misc{stl_phone_summarizer,
|
183 |
author = {masabhuq},
|
@@ -188,12 +157,9 @@ If you use this model, please cite the repository:
|
|
188 |
}
|
189 |
```
|
190 |
### 6. Clean Up
|
191 |
-
|
192 |
Free GPU memory after inference:
|
193 |
-
|
194 |
```python
|
195 |
model.cpu()
|
196 |
torch.cuda.empty_cache()
|
197 |
```
|
198 |
-
|
199 |
---
|
|
|
19 |
A conversational LLM for summarizing phone specifications into concise, appealing descriptions for e-commerce.
|
20 |
**Model:** LoRA fine-tuned Llama-3.2
|
21 |
**Repo:** [`masabhuq/stl_phone_summarizer`](https://huggingface.co/masabhuq/stl_phone_summarizer)
|
|
|
22 |
---
|
|
|
23 |
## Installation
|
|
|
24 |
```bash
|
25 |
pip install unsloth torch
|
26 |
```
|
|
|
27 |
---
|
|
|
28 |
## Usage
|
|
|
29 |
### 1. Load Model and Tokenizer
|
|
|
30 |
```python
|
31 |
from unsloth import FastLanguageModel
|
32 |
from unsloth.chat_templates import get_chat_template
|
|
|
39 |
)
|
40 |
FastLanguageModel.for_inference(model)
|
41 |
```
|
|
|
42 |
### 2. Apply the Chat Template
|
|
|
43 |
```python
|
44 |
tokenizer = get_chat_template(
|
45 |
tokenizer,
|
|
|
47 |
map_eos_token=True,
|
48 |
)
|
49 |
```
|
|
|
50 |
### 3. Prepare the Input
|
|
|
51 |
```python
|
52 |
system_prompt = (
|
53 |
"You are an expert at summarizing phone specifications into short, appealing key descriptions for an e-commerce site. "
|
|
|
76 |
add_generation_prompt=True,
|
77 |
)
|
78 |
```
|
|
|
79 |
### 4. Tokenize and Generate
|
|
|
80 |
```python
|
81 |
import torch
|
82 |
|
|
|
89 |
top_p=0.9,
|
90 |
)
|
91 |
```
|
|
|
92 |
### 5. Post-process Output
|
|
|
93 |
```python
|
94 |
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=False)
|
95 |
# Extract the last paragraph and clean up
|
|
|
99 |
print(clean_last_paragraph)
|
100 |
```
|
101 |
### 6. Clean Up
|
|
|
102 |
Free GPU memory after inference:
|
|
|
103 |
```python
|
104 |
model.cpu()
|
105 |
torch.cuda.empty_cache()
|
106 |
```
|
|
|
107 |
---
|
108 |
## Hardware Requirements
|
|
|
109 |
- **GPU**: CUDA-compatible GPU with ~4-6GB VRAM for 4-bit inference.
|
110 |
- **CPU**: Optional for offloading model after inference (`model.cpu()`).
|
111 |
- **RAM**: ~8GB system RAM for smooth operation with dataset processing.
|
|
|
|
|
112 |
---
|
|
|
113 |
## Notes
|
114 |
|
115 |
- **Chat Template:** The tokenizer is uploaded without a chat template. Always apply the template at runtime as shown above.
|
|
|
117 |
- **Output Format:** The model is trained to output in a strict format for easy parsing.
|
118 |
- **Memory Management**: Use `model.cpu()` and `torch.cuda.empty_cache()` to free GPU memory after inference, especially on low-VRAM GPUs.
|
119 |
- **Inference Parameters**: Adjust `temperature` and `top_p` for more or less creative outputs, and `max_new_tokens` for longer or shorter summaries.
|
|
|
120 |
---
|
121 |
## Model Details
|
|
|
122 |
- **Base Model**: `unsloth/Llama-3.2-3B-Instruct-bnb-4bit`
|
123 |
- **Fine-Tuning**: LoRA adapters with rank `r=16`, targeting modules: `["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]`.
|
124 |
- **Quantization**: 4-bit for memory efficiency (~4-6GB VRAM).
|
125 |
- **Training Data**: A dataset of phone specifications (`specs`) paired with concise summaries (`output`) in the format shown above.
|
126 |
- **Training Setup**: Fine-tuned with `trl.SFTTrainer`, `train_on_responses_only` to focus on assistant responses, and Llama-3.2 chat template for single-turn interactions.
|
127 |
- **Output Constraints**: Summaries are limited to 280 characters, focusing on user-friendly features and avoiding technical terms like "IP68" or "IPDC".
|
|
|
128 |
---
|
|
|
129 |
## Dataset
|
|
|
130 |
The model was trained on a custom dataset (`specs_list.json`) containing pairs of detailed phone specifications and their corresponding summaries. Each entry includes:
|
131 |
- `specs`: Detailed technical specs (e.g., display size, chipset, camera details).
|
132 |
- `output`: A concise summary in the format:
|
|
|
138 |
Others: [features]
|
139 |
```
|
140 |
The dataset emphasizes consumer-friendly features like high refresh rates, fast charging, and water resistance, avoiding overly technical terms.
|
|
|
141 |
---
|
142 |
|
143 |
## License
|
|
|
144 |
This model is licensed under the [Apache 2.0 License](LICENSE). See the `LICENSE` file in the repository for details.
|
145 |
|
146 |
---
|
147 |
|
148 |
## Citation
|
|
|
149 |
If you use this model, please cite the repository:
|
|
|
150 |
```bibtex
|
151 |
@misc{stl_phone_summarizer,
|
152 |
author = {masabhuq},
|
|
|
157 |
}
|
158 |
```
|
159 |
### 6. Clean Up
|
|
|
160 |
Free GPU memory after inference:
|
|
|
161 |
```python
|
162 |
model.cpu()
|
163 |
torch.cuda.empty_cache()
|
164 |
```
|
|
|
165 |
---
|