masabhuq
/

stl_phone_summarizer

@@ -19,21 +19,14 @@ language:
 A conversational LLM for summarizing phone specifications into concise, appealing descriptions for e-commerce.
 **Model:** LoRA fine-tuned Llama-3.2
 **Repo:** [`masabhuq/stl_phone_summarizer`](https://huggingface.co/masabhuq/stl_phone_summarizer)
 ---
 ## Installation
 ```bash
 pip install unsloth torch
 ```
 ---
 ## Usage
 ### 1. Load Model and Tokenizer
 ```python
 from unsloth import FastLanguageModel
 from unsloth.chat_templates import get_chat_template
@@ -46,9 +39,7 @@ model, tokenizer = FastLanguageModel.from_pretrained(
 )
 FastLanguageModel.for_inference(model)
 ```
 ### 2. Apply the Chat Template
 ```python
 tokenizer = get_chat_template(
     tokenizer,
@@ -56,9 +47,7 @@ tokenizer = get_chat_template(
     map_eos_token=True,
 )
 ```
 ### 3. Prepare the Input
 ```python
 system_prompt = (
     "You are an expert at summarizing phone specifications into short, appealing key descriptions for an e-commerce site. "
@@ -87,9 +76,7 @@ formatted_prompt = tokenizer.apply_chat_template(
     add_generation_prompt=True,
 )
 ```
 ### 4. Tokenize and Generate
 ```python
 import torch
@@ -102,9 +89,7 @@ outputs = model.generate(
     top_p=0.9,
 )
 ```
 ### 5. Post-process Output
 ```python
 generated_text = tokenizer.decode(outputs[0], skip_special_tokens=False)
 # Extract the last paragraph and clean up
@@ -114,24 +99,17 @@ clean_last_paragraph = last_paragraph.split("<|eot_id|>")[0].strip()
 print(clean_last_paragraph)
 ```
 ### 6. Clean Up
 Free GPU memory after inference:
 ```python
 model.cpu()
 torch.cuda.empty_cache()
 ```
 ---
 ## Hardware Requirements
 - **GPU**: CUDA-compatible GPU with ~4-6GB VRAM for 4-bit inference.
 - **CPU**: Optional for offloading model after inference (`model.cpu()`).
 - **RAM**: ~8GB system RAM for smooth operation with dataset processing.
 ---
 ## Notes
 - **Chat Template:** The tokenizer is uploaded without a chat template. Always apply the template at runtime as shown above.
@@ -139,21 +117,16 @@ torch.cuda.empty_cache()
 - **Output Format:** The model is trained to output in a strict format for easy parsing.
 - **Memory Management**: Use `model.cpu()` and `torch.cuda.empty_cache()` to free GPU memory after inference, especially on low-VRAM GPUs.
 - **Inference Parameters**: Adjust `temperature` and `top_p` for more or less creative outputs, and `max_new_tokens` for longer or shorter summaries.
 ---
 ## Model Details
 - **Base Model**: `unsloth/Llama-3.2-3B-Instruct-bnb-4bit`
 - **Fine-Tuning**: LoRA adapters with rank `r=16`, targeting modules: `["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]`.
 - **Quantization**: 4-bit for memory efficiency (~4-6GB VRAM).
 - **Training Data**: A dataset of phone specifications (`specs`) paired with concise summaries (`output`) in the format shown above.
 - **Training Setup**: Fine-tuned with `trl.SFTTrainer`, `train_on_responses_only` to focus on assistant responses, and Llama-3.2 chat template for single-turn interactions.
 - **Output Constraints**: Summaries are limited to 280 characters, focusing on user-friendly features and avoiding technical terms like "IP68" or "IPDC".
 ---
 ## Dataset
 The model was trained on a custom dataset (`specs_list.json`) containing pairs of detailed phone specifications and their corresponding summaries. Each entry includes:
 - `specs`: Detailed technical specs (e.g., display size, chipset, camera details).
 - `output`: A concise summary in the format:
@@ -165,19 +138,15 @@ The model was trained on a custom dataset (`specs_list.json`) containing pairs o
   Others: [features]
   ```
 The dataset emphasizes consumer-friendly features like high refresh rates, fast charging, and water resistance, avoiding overly technical terms.
 ---
 ## License
 This model is licensed under the [Apache 2.0 License](LICENSE). See the `LICENSE` file in the repository for details.
 ---
 ## Citation
 If you use this model, please cite the repository:
 ```bibtex
 @misc{stl_phone_summarizer,
   author = {masabhuq},
@@ -188,12 +157,9 @@ If you use this model, please cite the repository:
 }
 ```
 ### 6. Clean Up
 Free GPU memory after inference:
 ```python
 model.cpu()
 torch.cuda.empty_cache()
 ```
 ---

 A conversational LLM for summarizing phone specifications into concise, appealing descriptions for e-commerce.
 **Model:** LoRA fine-tuned Llama-3.2
 **Repo:** [`masabhuq/stl_phone_summarizer`](https://huggingface.co/masabhuq/stl_phone_summarizer)
 ---
 ## Installation
 ```bash
 pip install unsloth torch
 ```
 ---
 ## Usage
 ### 1. Load Model and Tokenizer
 ```python
 from unsloth import FastLanguageModel
 from unsloth.chat_templates import get_chat_template
 )
 FastLanguageModel.for_inference(model)
 ```
 ### 2. Apply the Chat Template
 ```python
 tokenizer = get_chat_template(
     tokenizer,
     map_eos_token=True,
 )
 ```
 ### 3. Prepare the Input
 ```python
 system_prompt = (
     "You are an expert at summarizing phone specifications into short, appealing key descriptions for an e-commerce site. "
     add_generation_prompt=True,
 )
 ```
 ### 4. Tokenize and Generate
 ```python
 import torch
     top_p=0.9,
 )
 ```
 ### 5. Post-process Output
 ```python
 generated_text = tokenizer.decode(outputs[0], skip_special_tokens=False)
 # Extract the last paragraph and clean up
 print(clean_last_paragraph)
 ```
 ### 6. Clean Up
 Free GPU memory after inference:
 ```python
 model.cpu()
 torch.cuda.empty_cache()
 ```
 ---
 ## Hardware Requirements
 - **GPU**: CUDA-compatible GPU with ~4-6GB VRAM for 4-bit inference.
 - **CPU**: Optional for offloading model after inference (`model.cpu()`).
 - **RAM**: ~8GB system RAM for smooth operation with dataset processing.
 ---
 ## Notes
 - **Chat Template:** The tokenizer is uploaded without a chat template. Always apply the template at runtime as shown above.
 - **Output Format:** The model is trained to output in a strict format for easy parsing.
 - **Memory Management**: Use `model.cpu()` and `torch.cuda.empty_cache()` to free GPU memory after inference, especially on low-VRAM GPUs.
 - **Inference Parameters**: Adjust `temperature` and `top_p` for more or less creative outputs, and `max_new_tokens` for longer or shorter summaries.
 ---
 ## Model Details
 - **Base Model**: `unsloth/Llama-3.2-3B-Instruct-bnb-4bit`
 - **Fine-Tuning**: LoRA adapters with rank `r=16`, targeting modules: `["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]`.
 - **Quantization**: 4-bit for memory efficiency (~4-6GB VRAM).
 - **Training Data**: A dataset of phone specifications (`specs`) paired with concise summaries (`output`) in the format shown above.
 - **Training Setup**: Fine-tuned with `trl.SFTTrainer`, `train_on_responses_only` to focus on assistant responses, and Llama-3.2 chat template for single-turn interactions.
 - **Output Constraints**: Summaries are limited to 280 characters, focusing on user-friendly features and avoiding technical terms like "IP68" or "IPDC".
 ---
 ## Dataset
 The model was trained on a custom dataset (`specs_list.json`) containing pairs of detailed phone specifications and their corresponding summaries. Each entry includes:
 - `specs`: Detailed technical specs (e.g., display size, chipset, camera details).
 - `output`: A concise summary in the format:
   Others: [features]
   ```
 The dataset emphasizes consumer-friendly features like high refresh rates, fast charging, and water resistance, avoiding overly technical terms.
 ---
 ## License
 This model is licensed under the [Apache 2.0 License](LICENSE). See the `LICENSE` file in the repository for details.
 ---
 ## Citation
 If you use this model, please cite the repository:
 ```bibtex
 @misc{stl_phone_summarizer,
   author = {masabhuq},
 }
 ```
 ### 6. Clean Up
 Free GPU memory after inference:
 ```python
 model.cpu()
 torch.cuda.empty_cache()
 ```
 ---