Mungert
/

Llama-3.2-3B-F1-Reasoning-Instruct-GGUF

+---
+license: llama3.2
+language:
+- en
+- zh
+base_model:
+- meta-llama/Llama-3.2-3B
+- lianghsun/Llama-3.2-3B-F1-Base
+library_name: transformers
+tags:
+- Taiwan
+- R.O.C
+- zhtw
+- SLM
+- Llama-32
+datasets:
+- lianghsun/tw-reasoning-instruct
+- minyichen/tw-instruct-R1-200k
+- minyichen/tw_mm_R1
+model-index:
+- name: Llama-3.2-3B-F1-Reasoning-Instruct
+  results:
+  - task:
+      type: question-answering
+      name: Single Choice Question
+    dataset:
+      type: ikala/tmmluplus
+      name: tmmlu+
+      config: all
+      split: test
+      revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
+    metrics:
+    - name: single choice
+      type: accuracy
+      value: 46.16
+  - task:
+      type: question-answering
+      name: Single Choice Question
+    dataset:
+      type: cais/mmlu
+      name: mmlu
+      config: all
+      split: test
+      revision: c30699e
+    metrics:
+    - name: single choice
+      type: accuracy
+      value: 51.22
+  - task:
+      type: question-answering
+      name: Single Choice Question
+    dataset:
+      type: lianghsun/tw-legal-benchmark-v1
+      name: tw-legal-benchmark-v1
+      config: all
+      split: test
+      revision: 66c3a5f
+    metrics:
+    - name: single choice
+      type: accuracy
+      value: 34.92
+metrics:
+- accuracy
+---
+# <span style="color: #7FFF7F;">Llama-3.2-3B-F1-Reasoning-Instruct GGUF Models</span>
+## <span style="color: #7F7FFF;">Model Generation Details</span>
+This model was generated using [llama.cpp](https://github.com/ggerganov/llama.cpp) at commit [`064cc596`](https://github.com/ggerganov/llama.cpp/commit/064cc596ac44308dc326a17c9e3163c34a6f29d1).
+## <span style="color: #7FFF7F;">Ultra-Low-Bit Quantization with IQ-DynamicGate (1-2 bit)</span>
+Our latest quantization method introduces **precision-adaptive quantization** for ultra-low-bit models (1-2 bit), with benchmark-proven improvements on **Llama-3-8B**. This approach uses layer-specific strategies to preserve accuracy while maintaining extreme memory efficiency.
+### **Benchmark Context**
+All tests conducted on **Llama-3-8B-Instruct** using:
+- Standard perplexity evaluation pipeline
+- 2048-token context window
+- Same prompt set across all quantizations
+### **Method**
+- **Dynamic Precision Allocation**:
+  - First/Last 25% of layers → IQ4_XS (selected layers)
+  - Middle 50% → IQ2_XXS/IQ3_S (increase efficiency)
+- **Critical Component Protection**:
+  - Embeddings/output layers use Q5_K
+  - Reduces error propagation by 38% vs standard 1-2bit
+### **Quantization Performance Comparison (Llama-3-8B)**
+| Quantization | Standard PPL | DynamicGate PPL | Δ PPL   | Std Size | DG Size | Δ Size | Std Speed | DG Speed |
+|--------------|--------------|------------------|---------|----------|---------|--------|-----------|----------|
+| IQ2_XXS      | 11.30        | 9.84             | -12.9%  | 2.5G     | 2.6G    | +0.1G  | 234s      | 246s     |
+| IQ2_XS       | 11.72        | 11.63            | -0.8%   | 2.7G     | 2.8G    | +0.1G  | 242s      | 246s     |
+| IQ2_S        | 14.31        | 9.02             | -36.9%  | 2.7G     | 2.9G    | +0.2G  | 238s      | 244s     |
+| IQ1_M        | 27.46        | 15.41            | -43.9%  | 2.2G     | 2.5G    | +0.3G  | 206s      | 212s     |
+| IQ1_S        | 53.07        | 32.00            | -39.7%  | 2.1G     | 2.4G    | +0.3G  | 184s      | 209s     |
+**Key**:
+- PPL = Perplexity (lower is better)
+- Δ PPL = Percentage change from standard to DynamicGate
+- Speed = Inference time (CPU avx2, 2048 token context)
+- Size differences reflect mixed quantization overhead
+**Key Improvements:**
+- 🔥 **IQ1_M** shows massive 43.9% perplexity reduction (27.46 → 15.41)
+- 🚀 **IQ2_S** cuts perplexity by 36.9% while adding only 0.2GB
+- ⚡ **IQ1_S** maintains 39.7% better accuracy despite 1-bit quantization
+**Tradeoffs:**
+- All variants have modest size increases (0.1-0.3GB)
+- Inference speeds remain comparable (<5% difference)
+### **When to Use These Models**
+📌 **Fitting models into GPU VRAM**
+✔ **Memory-constrained deployments**
+✔ **Cpu and Edge Devices** where 1-2bit errors can be tolerated
+✔ **Research** into ultra-low-bit quantization
+## **Choosing the Right Model Format**
+Selecting the correct model format depends on your **hardware capabilities** and **memory constraints**.
+### **BF16 (Brain Float 16) – Use if BF16 acceleration is available**
+- A 16-bit floating-point format designed for **faster computation** while retaining good precision.
+- Provides **similar dynamic range** as FP32 but with **lower memory usage**.
+- Recommended if your hardware supports **BF16 acceleration** (check your device's specs).
+- Ideal for **high-performance inference** with **reduced memory footprint** compared to FP32.
+📌 **Use BF16 if:**
+✔ Your hardware has native **BF16 support** (e.g., newer GPUs, TPUs).
+✔ You want **higher precision** while saving memory.
+✔ You plan to **requantize** the model into another format.
+📌 **Avoid BF16 if:**
+❌ Your hardware does **not** support BF16 (it may fall back to FP32 and run slower).
+❌ You need compatibility with older devices that lack BF16 optimization.
+---
+### **F16 (Float 16) – More widely supported than BF16**
+- A 16-bit floating-point **high precision** but with less of range of values than BF16.
+- Works on most devices with **FP16 acceleration support** (including many GPUs and some CPUs).
+- Slightly lower numerical precision than BF16 but generally sufficient for inference.
+📌 **Use F16 if:**
+✔ Your hardware supports **FP16** but **not BF16**.
+✔ You need a **balance between speed, memory usage, and accuracy**.
+✔ You are running on a **GPU** or another device optimized for FP16 computations.
+📌 **Avoid F16 if:**
+❌ Your device lacks **native FP16 support** (it may run slower than expected).
+❌ You have memory limitations.
+---
+### **Quantized Models (Q4_K, Q6_K, Q8, etc.) – For CPU & Low-VRAM Inference**
+Quantization reduces model size and memory usage while maintaining as much accuracy as possible.
+- **Lower-bit models (Q4_K)** → **Best for minimal memory usage**, may have lower precision.
+- **Higher-bit models (Q6_K, Q8_0)** → **Better accuracy**, requires more memory.
+📌 **Use Quantized Models if:**
+✔ You are running inference on a **CPU** and need an optimized model.
+✔ Your device has **low VRAM** and cannot load full-precision models.
+✔ You want to reduce **memory footprint** while keeping reasonable accuracy.
+📌 **Avoid Quantized Models if:**
+❌ You need **maximum accuracy** (full-precision models are better for this).
+❌ Your hardware has enough VRAM for higher-precision formats (BF16/F16).
+---
+### **Very Low-Bit Quantization (IQ3_XS, IQ3_S, IQ3_M, Q4_K, Q4_0)**
+These models are optimized for **extreme memory efficiency**, making them ideal for **low-power devices** or **large-scale deployments** where memory is a critical constraint.
+- **IQ3_XS**: Ultra-low-bit quantization (3-bit) with **extreme memory efficiency**.
+  - **Use case**: Best for **ultra-low-memory devices** where even Q4_K is too large.
+  - **Trade-off**: Lower accuracy compared to higher-bit quantizations.
+- **IQ3_S**: Small block size for **maximum memory efficiency**.
+  - **Use case**: Best for **low-memory devices** where **IQ3_XS** is too aggressive.
+- **IQ3_M**: Medium block size for better accuracy than **IQ3_S**.
+  - **Use case**: Suitable for **low-memory devices** where **IQ3_S** is too limiting.
+- **Q4_K**: 4-bit quantization with **block-wise optimization** for better accuracy.
+  - **Use case**: Best for **low-memory devices** where **Q6_K** is too large.
+- **Q4_0**: Pure 4-bit quantization, optimized for **ARM devices**.
+  - **Use case**: Best for **ARM-based devices** or **low-memory environments**.
+---
+### **Summary Table: Model Format Selection**
+| Model Format  | Precision  | Memory Usage  | Device Requirements  | Best Use Case  |
+|--------------|------------|---------------|----------------------|---------------|
+| **BF16**     | Highest    | High          | BF16-supported GPU/CPUs  | High-speed inference with reduced memory |
+| **F16**      | High       | High          | FP16-supported devices | GPU inference when BF16 isn't available |
+| **Q4_K**     | Medium Low | Low           | CPU or Low-VRAM devices | Best for memory-constrained environments |
+| **Q6_K**     | Medium     | Moderate      | CPU with more memory | Better accuracy while still being quantized |
+| **Q8_0**     | High       | Moderate      | CPU or GPU with enough VRAM | Best accuracy among quantized models |
+| **IQ3_XS**   | Very Low   | Very Low      | Ultra-low-memory devices | Extreme memory efficiency and low accuracy |
+| **Q4_0**     | Low        | Low           | ARM or low-memory devices | llama.cpp can optimize for ARM devices |
+---
+## **Included Files & Details**
+### `Llama-3.2-3B-F1-Reasoning-Instruct-bf16.gguf`
+- Model weights preserved in **BF16**.
+- Use this if you want to **requantize** the model into a different format.
+- Best if your device supports **BF16 acceleration**.
+### `Llama-3.2-3B-F1-Reasoning-Instruct-f16.gguf`
+- Model weights stored in **F16**.
+- Use if your device supports **FP16**, especially if BF16 is not available.
+### `Llama-3.2-3B-F1-Reasoning-Instruct-bf16-q8_0.gguf`
+- **Output & embeddings** remain in **BF16**.
+- All other layers quantized to **Q8_0**.
+- Use if your device supports **BF16** and you want a quantized version.
+### `Llama-3.2-3B-F1-Reasoning-Instruct-f16-q8_0.gguf`
+- **Output & embeddings** remain in **F16**.
+- All other layers quantized to **Q8_0**.
+### `Llama-3.2-3B-F1-Reasoning-Instruct-q4_k.gguf`
+- **Output & embeddings** quantized to **Q8_0**.
+- All other layers quantized to **Q4_K**.
+- Good for **CPU inference** with limited memory.
+### `Llama-3.2-3B-F1-Reasoning-Instruct-q4_k_s.gguf`
+- Smallest **Q4_K** variant, using less memory at the cost of accuracy.
+- Best for **very low-memory setups**.
+### `Llama-3.2-3B-F1-Reasoning-Instruct-q6_k.gguf`
+- **Output & embeddings** quantized to **Q8_0**.
+- All other layers quantized to **Q6_K** .
+### `Llama-3.2-3B-F1-Reasoning-Instruct-q8_0.gguf`
+- Fully **Q8** quantized model for better accuracy.
+- Requires **more memory** but offers higher precision.
+### `Llama-3.2-3B-F1-Reasoning-Instruct-iq3_xs.gguf`
+- **IQ3_XS** quantization, optimized for **extreme memory efficiency**.
+- Best for **ultra-low-memory devices**.
+### `Llama-3.2-3B-F1-Reasoning-Instruct-iq3_m.gguf`
+- **IQ3_M** quantization, offering a **medium block size** for better accuracy.
+- Suitable for **low-memory devices**.
+### `Llama-3.2-3B-F1-Reasoning-Instruct-q4_0.gguf`
+- Pure **Q4_0** quantization, optimized for **ARM devices**.
+- Best for **low-memory environments**.
+- Prefer IQ4_NL for better accuracy.
+# <span id="testllm" style="color: #7F7FFF;">🚀 If you find these models useful</span>
+❤ **Please click "Like" if you find this useful!**
+Help me test my **AI-Powered Network Monitor Assistant** with **quantum-ready security checks**:
+👉 [Free Network Monitor](https://readyforquantum.com/dashboard/?assistant=open)
+💬 **How to test**:
+ Choose an **AI assistant type**:
+   - `TurboLLM` (GPT-4o-mini)
+   - `HugLLM` (Hugginface Open-source)
+   - `TestLLM` (Experimental CPU-only)
+### **What I’m Testing**
+I’m pushing the limits of **small open-source models for AI network monitoring**, specifically:
+- **Function calling** against live network services
+- **How small can a model go** while still handling:
+  - Automated **Nmap scans**
+  - **Quantum-readiness checks**
+  - **Network Monitoring tasks**
+🟡 **TestLLM** – Current experimental model (llama.cpp on 2 CPU threads):
+- ✅ **Zero-configuration setup**
+- ⏳ 30s load time (slow inference but **no API costs**)
+- 🔧 **Help wanted!** If you’re into **edge-device AI**, let’s collaborate!
+### **Other Assistants**
+🟢 **TurboLLM** – Uses **gpt-4o-mini** for:
+- **Create custom cmd processors to run .net code on Free Network Monitor Agents**
+- **Real-time network diagnostics and monitoring**
+- **Security Audits**
+- **Penetration testing** (Nmap/Metasploit)
+- 🔑 Get more tokens by logging in or [downloading our Free Network Monitor Agent with integrated AI Assistant](https://readyforquantum.com/download)
+🔵 **HugLLM** – Latest Open-source models:
+- 🌐 Runs on Hugging Face Inference API
+### 💡 **Example commands to you could test**:
+1. `"Give me info on my websites SSL certificate"`
+2. `"Check if my server is using quantum safe encyption for communication"`
+3. `"Run a comprehensive security audit on my server"`
+4. '"Create a cmd processor to .. (what ever you want)" Note you need to install a Free Network Monitor Agent to run the .net code from. This is a very flexible and powerful feature. Use with caution!
+# Model Card for Llama-3.2-3B-F1-Reasoning-Instruct (a.k.a __Formosa-1-Reasoning__ or __F1-Reasoning__)
+<div align="center" style="line-height: 1;">
+  <a href="https://discord.gg/Cx737yw4ed" target="_blank" style="margin: 2px;">
+    <img alt="Discord" src="https://img.shields.io/badge/Discord-Twinkle%20AI-7289da?logo=discord&logoColor=white&color=7289da" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+  <a href="https://huggingface.co/twinkle-ai" target="_blank" style="margin: 2px;">
+    <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Twinkle%20AI-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+</div>
+<div align="center" style="line-height: 1;">
+  <a href="https://huggingface.co/meta-llama/Llama-3.2-1B/blob/main/LICENSE.txt" style="margin: 2px;">
+    <img alt="License" src="https://img.shields.io/badge/License-llama3.2-f5de53?&color=0081fb" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+</div>
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/618dc56cbc345ca7bf95f3cd/lBonfNs_7lzYguD4kJo6z.png)
+<!-- Provide a quick summary of what the model is/does. -->
+**Llama-3.2-3B-F1-Reasoning-Instruct**（a.k.a **Formosa-1-Reasoning** or **F1-Reasoning**） 是由 **[Twinkle AI](https://huggingface.co/twinkle-ai)** 與 **[APMIC](https://www.apmic.ai/)** 合作開發，並在[國家高速網路與計算中心](https://www.nchc.org.tw/)技術指導之下，針對中華民國台灣語境與任務需求所微調之繁體中文語言模型，涵蓋法律、教育、生活應用等多元場景，並以高指令跟隨能力為目標進行強化。
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [Liang Hsun Huang](https://huggingface.co/lianghsun)、[Min Yi Chen](https://huggingface.co/minyichen)、[Wen Bin Lin](https://huggingface.co/tedslin)、[Chao Chun Chuang](https://huggingface.co/c00cjz00) & [Dave Sung](https://huggingface.co/k1dave6412) (All authors have contributed equally to this work.)
+- **Funded by:** [APMIC](https://www.apmic.ai/)
+- **Model type:** LlamaForCausalLM
+- **Language(s) (NLP):** Tranditional Chinese & English
+- **License:** [llama3.2](https://huggingface.co/meta-llama/Llama-3.2-1B/blob/main/LICENSE.txt)
+### Model Sources
+<!-- Provide the basic links for the model. -->
+- **Repository:** [twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct](https://huggingface.co/twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct)
+- **Paper:** (TBA)
+- **Demo:** [Playground](https://3b02.coolify.apmic.ai/)
+## Evaluation
+### Results
+下表採用 [🌟 Twinkle Eval](https://github.com/ai-twinkle/Eval) 評測框架
+| 模型                               | 評測模式 | TMMLU+(%)       | 台灣法律(%)      | MMLU(%)         | 測試次數 | 選項排序 |
+|------------------------------------|---------|----------------|----------------|----------------|---------|---------|
+| [mistralai/Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501) | box     | 56.15 (±0.0172) | 37.48 (±0.0098) | 74.61 (±0.0154) | 3       | 隨機    |
+| [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)   | box     | 15.49 (±0.0104) | 25.68 (±0.0200) | 6.90 (±0.0096) | 3       | 隨機    |
+| [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)   | pattern | 35.85 (±0.0174) | 32.22 (±0.0023) | 59.33 (±0.0168) | 3       | 隨機    |
+| [MediaTek-Research/Llama-Breeze2-3B-Instruct](https://huggingface.co/MediaTek-Research/Llama-Breeze2-3B-Instruct) | pattern | 40.32 (±0.0181) | 38.92 (±0.0193) | 55.37 (±0.0180) | 3       | 隨機    |
+| [twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct](https://huggingface.co/twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct) (ours) | box | 46.16 (±0.0198) | 34.92 (±0.0243) | 51.22 (±0.0206) | 3       | 隨機    |
+下表用 lighteval 評測框架
+| 模型                                       | MATH-500 | GPQA Diamond |
+|--------------------------------------------|----------|--------------|
+| [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)                       | 44.40    | 27.78        |
+| [twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct](https://huggingface.co/twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct) (ours)   | **51.40**| **33.84**    |
+---
+## 🔧 Tool Calling
+本模型使用 Hermes 格式訓練，並支援平行呼叫（Parallel calling），以下為完整範例流程。
+Tool call 模板已經為大家寫好放進 chat-template 了，Enjoy it！
+### 1️⃣ 啟動 vLLM 後端
+> **⚠️ 注意：需要 vLLM 版本 >= 0.8.3，否則 `enable-reasoning`、`enable-auto-tool-choice` 無法同時開啟**
+```bash
+vllm serve twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct \
+  --port 8001 \
+  --enable-reasoning \
+  --reasoning-parser deepseek_r1 \
+  --enable-auto-tool-choice \
+  --tool-call-parser hermes
+```
+### 2️⃣ 定義工具（Functions）
+```python
+def get_weather(location: str, unit: str):
+    return f"{location}的氣溫是{unit}26度，晴朗無風"
+def search(query: str):
+    return "川普終於宣布對等關稅政策，針對 18 個經濟體課徵一半的對等關稅，並從 4/5 起對所有進口產品徵收10%的基準關稅！美國將針對被認定為不當貿易行為(不公平貿易) 的國家，於 4/9 起課徵報復型對等關稅 (Discounted Reciprocal Tariff)，例如：日本將被課徵 24% 的關稅，歐盟則為 20%，以取代普遍性的 10% 關稅。\n針對中國則開啟新一波 34% 關稅，並疊加於先前已實施的關稅上，這將使中國進口商品的基本關稅稅率達到 54%，而且這尚未包含拜登總統任內或川普第一任期所施加的額外關稅。加拿大與墨西哥則不適用這套對等關稅制度，但川普認為這些國家在芬太尼危機與非法移民問題尚未完全解決，因此計畫對這兩國的大多數進口商品施加 25% 關稅。另外原本針對汽車與多數其他商品的關稅豁免將於 4/2 到期。\n台灣的部分，美國擬向台灣課徵32％的對等關稅，雖然並未針對晶片特別課徵關稅，但仍在記者會中提到台灣搶奪所有的電腦與半導體晶片，最終促成台積電對美國投資計劃額外加碼 1,000 億美元的歷史性投資；歐盟則課徵20％的對等關稅。最後是汽車關稅將於 4/2 起，對所有外國製造的汽車課徵25% 關稅。"
+tools = [
+    {
+        "type": "function",
+        "function": {
+            "name": "get_weather",
+            "description": "Get the current weather in a given location",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "location": {"type": "string", "description": "國家或城市名, e.g., 'Taipei'、'Jaipei'"},
+                    "unit": {"type": "string", "description": "氣溫單位，亞洲城市使用攝氏；歐美城市使用華氏", "enum": ["celsius", "fahrenheit"]}
+                },
+                "required": ["location", "unit"]
+            }
+        }
+    },
+    {
+        "type": "function",
+        "function": {
+            "name": "search",
+            "description": "這是一個類似 Google 的搜尋引擎，關於知識、天氣、股票、電影、小說、百科等等問題，如果你不確定答案就搜尋一下。",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "query": {"type": "string", "description": "should be a search query, e.g., '2024 南韓 戒嚴'"}
+                },
+                "required": ["query"]
+            }
+        }
+    }
+]
+```
+### 3️⃣ 執行工具調用（Tool Calls）
+> **⚠️ 注意：system_prompt 可以不用帶，除非是需要時間基準的工具。**
+```python
+response = client.chat.completions.create(
+    model=client.models.list().data[0].id,
+    messages=[
+        {"role": "system", "content": "記住你的知識截止於 2024/12，今天是 2025/4/7"},
+        {"role": "user", "content": "台北氣溫如何? 另外，告訴我川普最新關稅政策"},
+    ],
+    max_tokens=1500,
+    temperature=0.6,
+    top_p=0.95,
+    tools=tools,
+    tool_choice="auto"
+)
+print(response.choices[0].message.reasoning_content)
+print(response.choices[0].message.tool_calls)
+```
+#### 🧠 推理內容輸出（僅顯示部分）
+> 好的，我需要幫助這個使用者解決他們的問題。他們問了兩件事：首先，臺北市的天氣情況，以及第二，關於川普最近的關稅政策。
+> 對於第一部分，他們提到了“臺北”，所以應該呼叫 get_weather 函式…
+> 接下來是關於川普的新關稅政策…
+> 總結一下，我需要分別進行兩次 API 呼叫，每次都有各自正確填寫的參數…
+#### ⚙️ Tool Calls List
+```json
+[ChatCompletionMessageToolCall(id='chatcmpl-tool-35e74420119349999913a10133b84bd3', function=Function(arguments='{"location": "Taipei", "unit": "celsius"}', name='get_weather'), type='function'), ChatCompletionMessageToolCall(id='chatcmpl-tool-7ffdcb98e59f4134a6171defe7f2e31b', function=Function(arguments='{"query": "Donald Trump latest tariffs policy"}', name='search'), type='function')]
+```
+### 4️⃣ 產生最終回答
+```python
+response = client.chat.completions.create(
+    model=client.models.list().data[0].id,
+    messages=[
+        {"role": "system", "content": "記住你的知識截止於 2024/12，今天是 2025/4/7"},
+        {"role": "user", "content": "台北氣溫如何? 另外，告訴我川普最新關稅政策"},
+        {
+            "role": "assistant",
+            "content": "",
+            "tool_calls": [
+                {
+                    "id": response.choices[0].message.tool_calls[0].id,
+                    "type": "function",
+                    "function": {
+                        "name": response.choices[0].message.tool_calls[0].function.name,
+                        "arguments": response.choices[0].message.tool_calls[0].function.arguments
+                    }
+                },
+                {
+                    "id": response.choices[0].message.tool_calls[1].id,
+                    "type": "function",
+                    "function": {
+                        "name": response.choices[0].message.tool_calls[1].function.name,
+                        "arguments": response.choices[0].message.tool_calls[1].function.arguments
+                    }
+                }
+            ]
+        },
+        {
+            "role": "tool",
+            "content": search(**json.loads(response.choices[0].message.tool_calls[0].function.arguments)),
+            "tool_call_id": response.choices[0].message.tool_calls[0].id # tool_call_id 必須要帶，才能正確配對 工具 及 tool_call
+        },
+        {
+            "role": "tool",
+            "content": get_weather(**json.loads(response.choices[0].message.tool_calls[1].function.arguments)),
+            "tool_call_id": response.choices[0].message.tool_calls[1].id # tool_call_id 必須要帶，才能正確配對 工具 及 tool_call
+        }
+    ],
+    max_tokens=1500,
+    temperature=0.6,
+    top_p=0.95,
+    tools=tools,
+    tool_choice="auto"
+)
+print(response.choices[0].message.reasoning_content)
+print(response.choices[0].message.content)
+```
+#### 🧠 推理內容輸出（僅顯示部分）
+> 首先，我需要處理使用者的查詢，他們要求了解臺北市的當下氣溫以及川普最近的關稅政策…
+> 在呼叫了 `get_weather` 後得到了臺北市氣溫為 26 度（攝氏）…
+> 接著，使用 `search` 搜尋「川普最新關稅政策 2025」…
+> 整合後，我提供如下摘要：
+#### 📋 最終輸出內容
+```text
+以下是您請求的資訊：
+**臺北市氣溫**
+- 目前的氣溫為 **26°C**（攝氏）
+- 天候狀況：晴朗無風
+**川普最新關稅政策概述**
+1. **對等關稅政策**
+   - 對 18 個經濟體課徵 50% 的對等關稅
+   - 自 4 月 5 日起，所有進口產品全面徵收 10% 基本關稅
+2. **報復型對等關稅**
+   - 日本 24%、歐盟 20%
+3. **對中國的高額關稅**
+   - 增加至 54%（原有關稅 + 新增 34%）
+4. **特殊案例**
+   - 加拿大與墨西哥不適用，但其他商品課徵 25%
+   - 汽車與部分商品的免稅即將到期
+5. **對台灣的影響**
+   - 美國計畫對台灣課徵 32% 關稅，但晶片暫無額外課稅
+6. **全球視角**
+   - 歐盟與日本關稅比例相對較高
+```
+## Citation
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+```yaml
+@misc{twinkleai2025llama3.2f1,
+  title        = {Llama-3.2-3B-F1-Reasoning-Instruct: A Traditional Chinese Instruction-Tuned Reasoning Language Model for Taiwan},
+  author       = {Huang, Liang Hsun and Chen, Min Yi and Lin, Wen Bin and Chuang, Chao Chun and Sung, Dave},
+  year         = {2025},
+  howpublished = {\url{https://huggingface.co/twinkle-ai/Llama-3.2-3B-F1-Instruct}},
+  note         = {Twinkle AI and APMIC. All authors contributed equally.}
+}
+```
+## Acknowledge
+- 特此感謝[國家高速網路與計算中心](https://www.nchc.org.tw/)的指導與 [APMIC](https://www.apmic.ai/) 的算力支援，才得以讓本專案訓利完成。
+- 特此致謝黃啟聖老師、許武龍（哈爸）、臺北市立第一女子高級中學物理科陳姿燁老師、[奈視科技](https://nanoseex.com/) CTO Howard、[AIPLUX Technology](https://aiplux.com/)、郭家嘉老師以及所有在資料集製作過程中提供寶貴協助的夥伴。
+## Model Card Authors
+[Twinkle AI](https://huggingface.co/twinkle-ai)
+## Model Card Contact
+[Twinkle AI](https://huggingface.co/twinkle-ai)