Luigi
/

SmolLM2-360M-Instruct-TaiwanChat

@@ -1,7 +1,7 @@
 ---
 library_name: peft
 license: apache-2.0
-base_model: unsloth/SmolLM2-360M-Instruct
 tags:
 - unsloth
 - trl
@@ -19,42 +19,131 @@ should probably proofread and complete it, then remove this comment. -->
 [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/pesi/SmolLM2-360M-Instruct-TaiwanChat_CLOUD/runs/9fnxruem)
 # SmolLM2-360M-Instruct-TaiwanChat
-This model is a fine-tuned version of [unsloth/SmolLM2-360M-Instruct](https://huggingface.co/unsloth/SmolLM2-360M-Instruct) on an unknown dataset.
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 5e-05
-- train_batch_size: 1
-- eval_batch_size: 1
-- seed: 3407
-- gradient_accumulation_steps: 4
-- total_train_batch_size: 4
-- optimizer: Use adamw_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: linear
-- lr_scheduler_warmup_ratio: 0.1
-- lr_scheduler_warmup_steps: 10
-- training_steps: 60
-- mixed_precision_training: Native AMP
-### Framework versions
-- PEFT 0.14.0
-- Transformers 4.47.1
-- Pytorch 2.5.1+cu124
-- Datasets 3.2.0
-- Tokenizers 0.21.0

 ---
 library_name: peft
 license: apache-2.0
+base\_model: unsloth/SmolLM2-360M-Instruct
 tags:
 - unsloth
 - trl
 [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/pesi/SmolLM2-360M-Instruct-TaiwanChat_CLOUD/runs/9fnxruem)
 # SmolLM2-360M-Instruct-TaiwanChat
+This model is a fine-tuned version of [unsloth/SmolLM2-360M-Instruct](https://huggingface.co/unsloth/SmolLM2-360M-Instruct) on the TaiwanChat dataset using Unsloth’s 4-bit quantization and LoRA adapters for efficient instruction-following in Traditional Chinese.
+## Installation
+```bash
+pip install -r requirements.txt
+```
+## Requirements
+* **Python**: 3.8 or higher
+* **CUDA**: 11.0 or higher (for GPU support)
+* See [requirements.txt](requirements.txt) for all package versions exactly.
 ## Model description
+* **Base**: SmolLM2-360M-Instruct (360M parameters)
+* **Quantization**: 4-bit weight quantization (activations in full precision)
+* **Adapters**: LoRA with rank `r=16`, alpha `α=16`, dropout `0.0`, applied to projection layers (`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`) citeturn2file0
+* **Dataset**: TaiwanChat (`yentinglin/TaiwanChat`) — 600 k filtered examples, max length 512, streamed and deduplicated, then split 90% train / 10% validation citeturn2file0
 ## Intended uses & limitations
+**Intended uses:**
+* Conversational AI and chatbots handling Traditional Chinese queries (e.g., weather, FAQs).
+* Instruction-following in a dialogue format.
+**Limitations:**
+* Limited capacity may cause occasional hallucinations or vague answers.
+* Performance measured on a 10% hold-out; real-world data discrepancies may impact quality.
+* Quantization and adapter-based tuning trade off some accuracy for efficiency.
 ## Training procedure
+1. **Data preparation**
+   * Streamed 600 k examples from HF dataset, filtered to `max_len=512`, cleaned assistant markers via regex, then shuffled and split with `Dataset.train_test_split(test_size=0.1)` citeturn2file0
+2. **Model & training setup**
+   * Loaded base with `FastLanguageModel.from_pretrained(..., load_in_4bit=True, full_finetuning=False)`
+   * Applied LoRA adapters via `FastLanguageModel.get_peft_model(...)`
+   * Used `LoggingSFTTrainer` subclass to catch empty-label and NaN-loss cases during eval citeturn2file0
+3. **Hyperparameters**
+   | Parameter                      |              Value |
+   | ------------------------------ | -----------------: |
+   | `num_train_epochs`             |                  3 |
+   | `per_device_train_batch_size`  |                 40 |
+   | `gradient_accumulation_steps`  |                  1 |
+   | `per_device_eval_batch_size`   |                  1 |
+   | `learning_rate`                |               2e-4 |
+   | `weight_decay`                 |               0.01 |
+   | `warmup_steps`                 |                500 |
+   | `max_seq_length`               |                512 |
+   | `evaluation_strategy`          |  steps (every 100) |
+   | `eval_steps`                   |                100 |
+   | `save_strategy`                | steps (every 1000) |
+   | `logging_steps`                |                 50 |
+   | `optimizer`                    |        adamw\_8bit |
+   | `gradient_checkpointing`       |              false |
+   | `seed`                         |               3407 |
+   | `EarlyStoppingCallback patience` |            4 evals |
+4. **Training & push**
+   * Ran `trainer.train()`, merged LoRA weights, then pushed the merged 16-bit model to `Luigi/SmolLM2-360M-Instruct-TaiwanChat` on Hugging Face via `model.push_to_hub_merged()` citeturn2file0
+## Example inference
+```python
+from transformers import AutoTokenizer
+from peft import PeftModel
+# Load merged model
+tokenizer = AutoTokenizer.from_pretrained("Luigi/SmolLM2-360M-Instruct-TaiwanChat")
+model = PeftModel.from_pretrained(
+    "Luigi/SmolLM2-360M-Instruct-TaiwanChat",
+    torch_dtype=torch.float16,
+).eval().to("cuda")
+# Query
+test_prompt = "請問台北今天的天氣如何？"
+inputs = tokenizer(test_prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=100,
+    do_sample=True,
+    temperature=0.8,
+)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+## Framework versions
+```text
+apex==0.1
+bitsandbytes==0.45.5
+datasets==3.2.0
+flash_attn==2.7.3
+hatchet==1.4.0
+importlib_metadata==8.6.1
+lit==18.1.8
+matplotlib==3.10.3
+numpy==2.2.5
+packaging==25.0
+pandas==2.2.3
+psutil==6.1.1
+pybind11==2.13.6
+pytest==8.1.1
+redis==6.0.0
+scipy==1.15.3
+setuptools==79.0.0
+Sphinx==8.2.3
+sphinx_gallery==0.19.0
+sphinx_rtd_theme==3.0.2
+tabulate==0.9.0
+torch==2.7.0a0+ecf3bae40a.nv25.2
+transformers==4.47.1
+trl==0.15.2
+unsloth==2025.4.1
+unsloth_zoo==2025.4.2
+vllm==0.8.5.post1
+wheel==0.45.1
+```