masabhuq commited on
Commit
8d5a549
·
verified ·
1 Parent(s): 67e43c9

updated readme

Browse files
Files changed (1) hide show
  1. README.md +0 -34
README.md CHANGED
@@ -19,21 +19,14 @@ language:
19
  A conversational LLM for summarizing phone specifications into concise, appealing descriptions for e-commerce.
20
  **Model:** LoRA fine-tuned Llama-3.2
21
  **Repo:** [`masabhuq/stl_phone_summarizer`](https://huggingface.co/masabhuq/stl_phone_summarizer)
22
-
23
  ---
24
-
25
  ## Installation
26
-
27
  ```bash
28
  pip install unsloth torch
29
  ```
30
-
31
  ---
32
-
33
  ## Usage
34
-
35
  ### 1. Load Model and Tokenizer
36
-
37
  ```python
38
  from unsloth import FastLanguageModel
39
  from unsloth.chat_templates import get_chat_template
@@ -46,9 +39,7 @@ model, tokenizer = FastLanguageModel.from_pretrained(
46
  )
47
  FastLanguageModel.for_inference(model)
48
  ```
49
-
50
  ### 2. Apply the Chat Template
51
-
52
  ```python
53
  tokenizer = get_chat_template(
54
  tokenizer,
@@ -56,9 +47,7 @@ tokenizer = get_chat_template(
56
  map_eos_token=True,
57
  )
58
  ```
59
-
60
  ### 3. Prepare the Input
61
-
62
  ```python
63
  system_prompt = (
64
  "You are an expert at summarizing phone specifications into short, appealing key descriptions for an e-commerce site. "
@@ -87,9 +76,7 @@ formatted_prompt = tokenizer.apply_chat_template(
87
  add_generation_prompt=True,
88
  )
89
  ```
90
-
91
  ### 4. Tokenize and Generate
92
-
93
  ```python
94
  import torch
95
 
@@ -102,9 +89,7 @@ outputs = model.generate(
102
  top_p=0.9,
103
  )
104
  ```
105
-
106
  ### 5. Post-process Output
107
-
108
  ```python
109
  generated_text = tokenizer.decode(outputs[0], skip_special_tokens=False)
110
  # Extract the last paragraph and clean up
@@ -114,24 +99,17 @@ clean_last_paragraph = last_paragraph.split("<|eot_id|>")[0].strip()
114
  print(clean_last_paragraph)
115
  ```
116
  ### 6. Clean Up
117
-
118
  Free GPU memory after inference:
119
-
120
  ```python
121
  model.cpu()
122
  torch.cuda.empty_cache()
123
  ```
124
-
125
  ---
126
  ## Hardware Requirements
127
-
128
  - **GPU**: CUDA-compatible GPU with ~4-6GB VRAM for 4-bit inference.
129
  - **CPU**: Optional for offloading model after inference (`model.cpu()`).
130
  - **RAM**: ~8GB system RAM for smooth operation with dataset processing.
131
-
132
-
133
  ---
134
-
135
  ## Notes
136
 
137
  - **Chat Template:** The tokenizer is uploaded without a chat template. Always apply the template at runtime as shown above.
@@ -139,21 +117,16 @@ torch.cuda.empty_cache()
139
  - **Output Format:** The model is trained to output in a strict format for easy parsing.
140
  - **Memory Management**: Use `model.cpu()` and `torch.cuda.empty_cache()` to free GPU memory after inference, especially on low-VRAM GPUs.
141
  - **Inference Parameters**: Adjust `temperature` and `top_p` for more or less creative outputs, and `max_new_tokens` for longer or shorter summaries.
142
-
143
  ---
144
  ## Model Details
145
-
146
  - **Base Model**: `unsloth/Llama-3.2-3B-Instruct-bnb-4bit`
147
  - **Fine-Tuning**: LoRA adapters with rank `r=16`, targeting modules: `["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]`.
148
  - **Quantization**: 4-bit for memory efficiency (~4-6GB VRAM).
149
  - **Training Data**: A dataset of phone specifications (`specs`) paired with concise summaries (`output`) in the format shown above.
150
  - **Training Setup**: Fine-tuned with `trl.SFTTrainer`, `train_on_responses_only` to focus on assistant responses, and Llama-3.2 chat template for single-turn interactions.
151
  - **Output Constraints**: Summaries are limited to 280 characters, focusing on user-friendly features and avoiding technical terms like "IP68" or "IPDC".
152
-
153
  ---
154
-
155
  ## Dataset
156
-
157
  The model was trained on a custom dataset (`specs_list.json`) containing pairs of detailed phone specifications and their corresponding summaries. Each entry includes:
158
  - `specs`: Detailed technical specs (e.g., display size, chipset, camera details).
159
  - `output`: A concise summary in the format:
@@ -165,19 +138,15 @@ The model was trained on a custom dataset (`specs_list.json`) containing pairs o
165
  Others: [features]
166
  ```
167
  The dataset emphasizes consumer-friendly features like high refresh rates, fast charging, and water resistance, avoiding overly technical terms.
168
-
169
  ---
170
 
171
  ## License
172
-
173
  This model is licensed under the [Apache 2.0 License](LICENSE). See the `LICENSE` file in the repository for details.
174
 
175
  ---
176
 
177
  ## Citation
178
-
179
  If you use this model, please cite the repository:
180
-
181
  ```bibtex
182
  @misc{stl_phone_summarizer,
183
  author = {masabhuq},
@@ -188,12 +157,9 @@ If you use this model, please cite the repository:
188
  }
189
  ```
190
  ### 6. Clean Up
191
-
192
  Free GPU memory after inference:
193
-
194
  ```python
195
  model.cpu()
196
  torch.cuda.empty_cache()
197
  ```
198
-
199
  ---
 
19
  A conversational LLM for summarizing phone specifications into concise, appealing descriptions for e-commerce.
20
  **Model:** LoRA fine-tuned Llama-3.2
21
  **Repo:** [`masabhuq/stl_phone_summarizer`](https://huggingface.co/masabhuq/stl_phone_summarizer)
 
22
  ---
 
23
  ## Installation
 
24
  ```bash
25
  pip install unsloth torch
26
  ```
 
27
  ---
 
28
  ## Usage
 
29
  ### 1. Load Model and Tokenizer
 
30
  ```python
31
  from unsloth import FastLanguageModel
32
  from unsloth.chat_templates import get_chat_template
 
39
  )
40
  FastLanguageModel.for_inference(model)
41
  ```
 
42
  ### 2. Apply the Chat Template
 
43
  ```python
44
  tokenizer = get_chat_template(
45
  tokenizer,
 
47
  map_eos_token=True,
48
  )
49
  ```
 
50
  ### 3. Prepare the Input
 
51
  ```python
52
  system_prompt = (
53
  "You are an expert at summarizing phone specifications into short, appealing key descriptions for an e-commerce site. "
 
76
  add_generation_prompt=True,
77
  )
78
  ```
 
79
  ### 4. Tokenize and Generate
 
80
  ```python
81
  import torch
82
 
 
89
  top_p=0.9,
90
  )
91
  ```
 
92
  ### 5. Post-process Output
 
93
  ```python
94
  generated_text = tokenizer.decode(outputs[0], skip_special_tokens=False)
95
  # Extract the last paragraph and clean up
 
99
  print(clean_last_paragraph)
100
  ```
101
  ### 6. Clean Up
 
102
  Free GPU memory after inference:
 
103
  ```python
104
  model.cpu()
105
  torch.cuda.empty_cache()
106
  ```
 
107
  ---
108
  ## Hardware Requirements
 
109
  - **GPU**: CUDA-compatible GPU with ~4-6GB VRAM for 4-bit inference.
110
  - **CPU**: Optional for offloading model after inference (`model.cpu()`).
111
  - **RAM**: ~8GB system RAM for smooth operation with dataset processing.
 
 
112
  ---
 
113
  ## Notes
114
 
115
  - **Chat Template:** The tokenizer is uploaded without a chat template. Always apply the template at runtime as shown above.
 
117
  - **Output Format:** The model is trained to output in a strict format for easy parsing.
118
  - **Memory Management**: Use `model.cpu()` and `torch.cuda.empty_cache()` to free GPU memory after inference, especially on low-VRAM GPUs.
119
  - **Inference Parameters**: Adjust `temperature` and `top_p` for more or less creative outputs, and `max_new_tokens` for longer or shorter summaries.
 
120
  ---
121
  ## Model Details
 
122
  - **Base Model**: `unsloth/Llama-3.2-3B-Instruct-bnb-4bit`
123
  - **Fine-Tuning**: LoRA adapters with rank `r=16`, targeting modules: `["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]`.
124
  - **Quantization**: 4-bit for memory efficiency (~4-6GB VRAM).
125
  - **Training Data**: A dataset of phone specifications (`specs`) paired with concise summaries (`output`) in the format shown above.
126
  - **Training Setup**: Fine-tuned with `trl.SFTTrainer`, `train_on_responses_only` to focus on assistant responses, and Llama-3.2 chat template for single-turn interactions.
127
  - **Output Constraints**: Summaries are limited to 280 characters, focusing on user-friendly features and avoiding technical terms like "IP68" or "IPDC".
 
128
  ---
 
129
  ## Dataset
 
130
  The model was trained on a custom dataset (`specs_list.json`) containing pairs of detailed phone specifications and their corresponding summaries. Each entry includes:
131
  - `specs`: Detailed technical specs (e.g., display size, chipset, camera details).
132
  - `output`: A concise summary in the format:
 
138
  Others: [features]
139
  ```
140
  The dataset emphasizes consumer-friendly features like high refresh rates, fast charging, and water resistance, avoiding overly technical terms.
 
141
  ---
142
 
143
  ## License
 
144
  This model is licensed under the [Apache 2.0 License](LICENSE). See the `LICENSE` file in the repository for details.
145
 
146
  ---
147
 
148
  ## Citation
 
149
  If you use this model, please cite the repository:
 
150
  ```bibtex
151
  @misc{stl_phone_summarizer,
152
  author = {masabhuq},
 
157
  }
158
  ```
159
  ### 6. Clean Up
 
160
  Free GPU memory after inference:
 
161
  ```python
162
  model.cpu()
163
  torch.cuda.empty_cache()
164
  ```
 
165
  ---