|
--- |
|
license: llama3.2 |
|
language: |
|
- en |
|
- zh |
|
base_model: |
|
- meta-llama/Llama-3.2-3B |
|
- lianghsun/Llama-3.2-3B-F1-Base |
|
library_name: transformers |
|
tags: |
|
- Taiwan |
|
- R.O.C |
|
- zhtw |
|
- SLM |
|
- Llama-32 |
|
datasets: |
|
- lianghsun/tw-reasoning-instruct |
|
- minyichen/tw-instruct-R1-200k |
|
- minyichen/tw_mm_R1 |
|
model-index: |
|
- name: Llama-3.2-3B-F1-Reasoning-Instruct |
|
results: |
|
- task: |
|
type: question-answering |
|
name: Single Choice Question |
|
dataset: |
|
type: ikala/tmmluplus |
|
name: tmmlu+ |
|
config: all |
|
split: test |
|
revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c |
|
metrics: |
|
- name: single choice |
|
type: accuracy |
|
value: 46.16 |
|
- task: |
|
type: question-answering |
|
name: Single Choice Question |
|
dataset: |
|
type: cais/mmlu |
|
name: mmlu |
|
config: all |
|
split: test |
|
revision: c30699e |
|
metrics: |
|
- name: single choice |
|
type: accuracy |
|
value: 51.22 |
|
- task: |
|
type: question-answering |
|
name: Single Choice Question |
|
dataset: |
|
type: lianghsun/tw-legal-benchmark-v1 |
|
name: tw-legal-benchmark-v1 |
|
config: all |
|
split: test |
|
revision: 66c3a5f |
|
metrics: |
|
- name: single choice |
|
type: accuracy |
|
value: 34.92 |
|
metrics: |
|
- accuracy |
|
--- |
|
|
|
# <span style="color: #7FFF7F;">Llama-3.2-3B-F1-Reasoning-Instruct GGUF Models</span> |
|
|
|
|
|
## <span style="color: #7F7FFF;">Model Generation Details</span> |
|
|
|
This model was generated using [llama.cpp](https://github.com/ggerganov/llama.cpp) at commit [`064cc596`](https://github.com/ggerganov/llama.cpp/commit/064cc596ac44308dc326a17c9e3163c34a6f29d1). |
|
|
|
|
|
|
|
|
|
## <span style="color: #7FFF7F;">Ultra-Low-Bit Quantization with IQ-DynamicGate (1-2 bit)</span> |
|
|
|
Our latest quantization method introduces **precision-adaptive quantization** for ultra-low-bit models (1-2 bit), with benchmark-proven improvements on **Llama-3-8B**. This approach uses layer-specific strategies to preserve accuracy while maintaining extreme memory efficiency. |
|
|
|
### **Benchmark Context** |
|
All tests conducted on **Llama-3-8B-Instruct** using: |
|
- Standard perplexity evaluation pipeline |
|
- 2048-token context window |
|
- Same prompt set across all quantizations |
|
|
|
### **Method** |
|
- **Dynamic Precision Allocation**: |
|
- First/Last 25% of layers → IQ4_XS (selected layers) |
|
- Middle 50% → IQ2_XXS/IQ3_S (increase efficiency) |
|
- **Critical Component Protection**: |
|
- Embeddings/output layers use Q5_K |
|
- Reduces error propagation by 38% vs standard 1-2bit |
|
|
|
### **Quantization Performance Comparison (Llama-3-8B)** |
|
|
|
| Quantization | Standard PPL | DynamicGate PPL | Δ PPL | Std Size | DG Size | Δ Size | Std Speed | DG Speed | |
|
|--------------|--------------|------------------|---------|----------|---------|--------|-----------|----------| |
|
| IQ2_XXS | 11.30 | 9.84 | -12.9% | 2.5G | 2.6G | +0.1G | 234s | 246s | |
|
| IQ2_XS | 11.72 | 11.63 | -0.8% | 2.7G | 2.8G | +0.1G | 242s | 246s | |
|
| IQ2_S | 14.31 | 9.02 | -36.9% | 2.7G | 2.9G | +0.2G | 238s | 244s | |
|
| IQ1_M | 27.46 | 15.41 | -43.9% | 2.2G | 2.5G | +0.3G | 206s | 212s | |
|
| IQ1_S | 53.07 | 32.00 | -39.7% | 2.1G | 2.4G | +0.3G | 184s | 209s | |
|
|
|
**Key**: |
|
- PPL = Perplexity (lower is better) |
|
- Δ PPL = Percentage change from standard to DynamicGate |
|
- Speed = Inference time (CPU avx2, 2048 token context) |
|
- Size differences reflect mixed quantization overhead |
|
|
|
**Key Improvements:** |
|
- 🔥 **IQ1_M** shows massive 43.9% perplexity reduction (27.46 → 15.41) |
|
- 🚀 **IQ2_S** cuts perplexity by 36.9% while adding only 0.2GB |
|
- ⚡ **IQ1_S** maintains 39.7% better accuracy despite 1-bit quantization |
|
|
|
**Tradeoffs:** |
|
- All variants have modest size increases (0.1-0.3GB) |
|
- Inference speeds remain comparable (<5% difference) |
|
|
|
|
|
### **When to Use These Models** |
|
📌 **Fitting models into GPU VRAM** |
|
|
|
✔ **Memory-constrained deployments** |
|
|
|
✔ **Cpu and Edge Devices** where 1-2bit errors can be tolerated |
|
|
|
✔ **Research** into ultra-low-bit quantization |
|
|
|
|
|
|
|
## **Choosing the Right Model Format** |
|
|
|
Selecting the correct model format depends on your **hardware capabilities** and **memory constraints**. |
|
|
|
### **BF16 (Brain Float 16) – Use if BF16 acceleration is available** |
|
- A 16-bit floating-point format designed for **faster computation** while retaining good precision. |
|
- Provides **similar dynamic range** as FP32 but with **lower memory usage**. |
|
- Recommended if your hardware supports **BF16 acceleration** (check your device's specs). |
|
- Ideal for **high-performance inference** with **reduced memory footprint** compared to FP32. |
|
|
|
📌 **Use BF16 if:** |
|
✔ Your hardware has native **BF16 support** (e.g., newer GPUs, TPUs). |
|
✔ You want **higher precision** while saving memory. |
|
✔ You plan to **requantize** the model into another format. |
|
|
|
📌 **Avoid BF16 if:** |
|
❌ Your hardware does **not** support BF16 (it may fall back to FP32 and run slower). |
|
❌ You need compatibility with older devices that lack BF16 optimization. |
|
|
|
--- |
|
|
|
### **F16 (Float 16) – More widely supported than BF16** |
|
- A 16-bit floating-point **high precision** but with less of range of values than BF16. |
|
- Works on most devices with **FP16 acceleration support** (including many GPUs and some CPUs). |
|
- Slightly lower numerical precision than BF16 but generally sufficient for inference. |
|
|
|
📌 **Use F16 if:** |
|
✔ Your hardware supports **FP16** but **not BF16**. |
|
✔ You need a **balance between speed, memory usage, and accuracy**. |
|
✔ You are running on a **GPU** or another device optimized for FP16 computations. |
|
|
|
📌 **Avoid F16 if:** |
|
❌ Your device lacks **native FP16 support** (it may run slower than expected). |
|
❌ You have memory limitations. |
|
|
|
--- |
|
|
|
### **Quantized Models (Q4_K, Q6_K, Q8, etc.) – For CPU & Low-VRAM Inference** |
|
Quantization reduces model size and memory usage while maintaining as much accuracy as possible. |
|
- **Lower-bit models (Q4_K)** → **Best for minimal memory usage**, may have lower precision. |
|
- **Higher-bit models (Q6_K, Q8_0)** → **Better accuracy**, requires more memory. |
|
|
|
📌 **Use Quantized Models if:** |
|
✔ You are running inference on a **CPU** and need an optimized model. |
|
✔ Your device has **low VRAM** and cannot load full-precision models. |
|
✔ You want to reduce **memory footprint** while keeping reasonable accuracy. |
|
|
|
📌 **Avoid Quantized Models if:** |
|
❌ You need **maximum accuracy** (full-precision models are better for this). |
|
❌ Your hardware has enough VRAM for higher-precision formats (BF16/F16). |
|
|
|
--- |
|
|
|
### **Very Low-Bit Quantization (IQ3_XS, IQ3_S, IQ3_M, Q4_K, Q4_0)** |
|
These models are optimized for **extreme memory efficiency**, making them ideal for **low-power devices** or **large-scale deployments** where memory is a critical constraint. |
|
|
|
- **IQ3_XS**: Ultra-low-bit quantization (3-bit) with **extreme memory efficiency**. |
|
- **Use case**: Best for **ultra-low-memory devices** where even Q4_K is too large. |
|
- **Trade-off**: Lower accuracy compared to higher-bit quantizations. |
|
|
|
- **IQ3_S**: Small block size for **maximum memory efficiency**. |
|
- **Use case**: Best for **low-memory devices** where **IQ3_XS** is too aggressive. |
|
|
|
- **IQ3_M**: Medium block size for better accuracy than **IQ3_S**. |
|
- **Use case**: Suitable for **low-memory devices** where **IQ3_S** is too limiting. |
|
|
|
- **Q4_K**: 4-bit quantization with **block-wise optimization** for better accuracy. |
|
- **Use case**: Best for **low-memory devices** where **Q6_K** is too large. |
|
|
|
- **Q4_0**: Pure 4-bit quantization, optimized for **ARM devices**. |
|
- **Use case**: Best for **ARM-based devices** or **low-memory environments**. |
|
|
|
--- |
|
|
|
### **Summary Table: Model Format Selection** |
|
|
|
| Model Format | Precision | Memory Usage | Device Requirements | Best Use Case | |
|
|--------------|------------|---------------|----------------------|---------------| |
|
| **BF16** | Highest | High | BF16-supported GPU/CPUs | High-speed inference with reduced memory | |
|
| **F16** | High | High | FP16-supported devices | GPU inference when BF16 isn't available | |
|
| **Q4_K** | Medium Low | Low | CPU or Low-VRAM devices | Best for memory-constrained environments | |
|
| **Q6_K** | Medium | Moderate | CPU with more memory | Better accuracy while still being quantized | |
|
| **Q8_0** | High | Moderate | CPU or GPU with enough VRAM | Best accuracy among quantized models | |
|
| **IQ3_XS** | Very Low | Very Low | Ultra-low-memory devices | Extreme memory efficiency and low accuracy | |
|
| **Q4_0** | Low | Low | ARM or low-memory devices | llama.cpp can optimize for ARM devices | |
|
|
|
--- |
|
|
|
## **Included Files & Details** |
|
|
|
### `Llama-3.2-3B-F1-Reasoning-Instruct-bf16.gguf` |
|
- Model weights preserved in **BF16**. |
|
- Use this if you want to **requantize** the model into a different format. |
|
- Best if your device supports **BF16 acceleration**. |
|
|
|
### `Llama-3.2-3B-F1-Reasoning-Instruct-f16.gguf` |
|
- Model weights stored in **F16**. |
|
- Use if your device supports **FP16**, especially if BF16 is not available. |
|
|
|
### `Llama-3.2-3B-F1-Reasoning-Instruct-bf16-q8_0.gguf` |
|
- **Output & embeddings** remain in **BF16**. |
|
- All other layers quantized to **Q8_0**. |
|
- Use if your device supports **BF16** and you want a quantized version. |
|
|
|
### `Llama-3.2-3B-F1-Reasoning-Instruct-f16-q8_0.gguf` |
|
- **Output & embeddings** remain in **F16**. |
|
- All other layers quantized to **Q8_0**. |
|
|
|
### `Llama-3.2-3B-F1-Reasoning-Instruct-q4_k.gguf` |
|
- **Output & embeddings** quantized to **Q8_0**. |
|
- All other layers quantized to **Q4_K**. |
|
- Good for **CPU inference** with limited memory. |
|
|
|
### `Llama-3.2-3B-F1-Reasoning-Instruct-q4_k_s.gguf` |
|
- Smallest **Q4_K** variant, using less memory at the cost of accuracy. |
|
- Best for **very low-memory setups**. |
|
|
|
### `Llama-3.2-3B-F1-Reasoning-Instruct-q6_k.gguf` |
|
- **Output & embeddings** quantized to **Q8_0**. |
|
- All other layers quantized to **Q6_K** . |
|
|
|
### `Llama-3.2-3B-F1-Reasoning-Instruct-q8_0.gguf` |
|
- Fully **Q8** quantized model for better accuracy. |
|
- Requires **more memory** but offers higher precision. |
|
|
|
### `Llama-3.2-3B-F1-Reasoning-Instruct-iq3_xs.gguf` |
|
- **IQ3_XS** quantization, optimized for **extreme memory efficiency**. |
|
- Best for **ultra-low-memory devices**. |
|
|
|
### `Llama-3.2-3B-F1-Reasoning-Instruct-iq3_m.gguf` |
|
- **IQ3_M** quantization, offering a **medium block size** for better accuracy. |
|
- Suitable for **low-memory devices**. |
|
|
|
### `Llama-3.2-3B-F1-Reasoning-Instruct-q4_0.gguf` |
|
- Pure **Q4_0** quantization, optimized for **ARM devices**. |
|
- Best for **low-memory environments**. |
|
- Prefer IQ4_NL for better accuracy. |
|
|
|
# <span id="testllm" style="color: #7F7FFF;">🚀 If you find these models useful</span> |
|
❤ **Please click "Like" if you find this useful!** |
|
Help me test my **AI-Powered Network Monitor Assistant** with **quantum-ready security checks**: |
|
👉 [Free Network Monitor](https://readyforquantum.com/dashboard/?assistant=open) |
|
|
|
💬 **How to test**: |
|
Choose an **AI assistant type**: |
|
- `TurboLLM` (GPT-4o-mini) |
|
- `HugLLM` (Hugginface Open-source) |
|
- `TestLLM` (Experimental CPU-only) |
|
|
|
### **What I’m Testing** |
|
I’m pushing the limits of **small open-source models for AI network monitoring**, specifically: |
|
- **Function calling** against live network services |
|
- **How small can a model go** while still handling: |
|
- Automated **Nmap scans** |
|
- **Quantum-readiness checks** |
|
- **Network Monitoring tasks** |
|
|
|
🟡 **TestLLM** – Current experimental model (llama.cpp on 2 CPU threads): |
|
- ✅ **Zero-configuration setup** |
|
- ⏳ 30s load time (slow inference but **no API costs**) |
|
- 🔧 **Help wanted!** If you’re into **edge-device AI**, let’s collaborate! |
|
|
|
### **Other Assistants** |
|
🟢 **TurboLLM** – Uses **gpt-4o-mini** for: |
|
- **Create custom cmd processors to run .net code on Free Network Monitor Agents** |
|
- **Real-time network diagnostics and monitoring** |
|
- **Security Audits** |
|
- **Penetration testing** (Nmap/Metasploit) |
|
- 🔑 Get more tokens by logging in or [downloading our Free Network Monitor Agent with integrated AI Assistant](https://readyforquantum.com/download) |
|
|
|
🔵 **HugLLM** – Latest Open-source models: |
|
- 🌐 Runs on Hugging Face Inference API |
|
|
|
### 💡 **Example commands to you could test**: |
|
1. `"Give me info on my websites SSL certificate"` |
|
2. `"Check if my server is using quantum safe encyption for communication"` |
|
3. `"Run a comprehensive security audit on my server"` |
|
4. '"Create a cmd processor to .. (what ever you want)" Note you need to install a Free Network Monitor Agent to run the .net code from. This is a very flexible and powerful feature. Use with caution! |
|
|
|
|
|
|
|
# Model Card for Llama-3.2-3B-F1-Reasoning-Instruct (a.k.a __Formosa-1-Reasoning__ or __F1-Reasoning__) |
|
|
|
<div align="center" style="line-height: 1;"> |
|
<a href="https://discord.gg/Cx737yw4ed" target="_blank" style="margin: 2px;"> |
|
<img alt="Discord" src="https://img.shields.io/badge/Discord-Twinkle%20AI-7289da?logo=discord&logoColor=white&color=7289da" style="display: inline-block; vertical-align: middle;"/> |
|
</a> |
|
<a href="https://huggingface.co/twinkle-ai" target="_blank" style="margin: 2px;"> |
|
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Twinkle%20AI-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/> |
|
</a> |
|
</div> |
|
|
|
<div align="center" style="line-height: 1;"> |
|
<a href="https://huggingface.co/meta-llama/Llama-3.2-1B/blob/main/LICENSE.txt" style="margin: 2px;"> |
|
<img alt="License" src="https://img.shields.io/badge/License-llama3.2-f5de53?&color=0081fb" style="display: inline-block; vertical-align: middle;"/> |
|
</a> |
|
</div> |
|
|
|
 |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
**Llama-3.2-3B-F1-Reasoning-Instruct**(a.k.a **Formosa-1-Reasoning** or **F1-Reasoning**) 是由 **[Twinkle AI](https://huggingface.co/twinkle-ai)** 與 **[APMIC](https://www.apmic.ai/)** 合作開發,並在[國家高速網路與計算中心](https://www.nchc.org.tw/)技術指導之下,針對中華民國台灣語境與任務需求所微調之繁體中文語言模型,涵蓋法律、教育、生活應用等多元場景,並以高指令跟隨能力為目標進行強化。 |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
- **Developed by:** [Liang Hsun Huang](https://huggingface.co/lianghsun)、[Min Yi Chen](https://huggingface.co/minyichen)、[Wen Bin Lin](https://huggingface.co/tedslin)、[Chao Chun Chuang](https://huggingface.co/c00cjz00) & [Dave Sung](https://huggingface.co/k1dave6412) (All authors have contributed equally to this work.) |
|
- **Funded by:** [APMIC](https://www.apmic.ai/) |
|
- **Model type:** LlamaForCausalLM |
|
- **Language(s) (NLP):** Tranditional Chinese & English |
|
- **License:** [llama3.2](https://huggingface.co/meta-llama/Llama-3.2-1B/blob/main/LICENSE.txt) |
|
|
|
### Model Sources |
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **Repository:** [twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct](https://huggingface.co/twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct) |
|
- **Paper:** (TBA) |
|
- **Demo:** [Playground](https://3b02.coolify.apmic.ai/) |
|
|
|
## Evaluation |
|
|
|
### Results |
|
|
|
下表採用 [🌟 Twinkle Eval](https://github.com/ai-twinkle/Eval) 評測框架 |
|
| 模型 | 評測模式 | TMMLU+(%) | 台灣法律(%) | MMLU(%) | 測試次數 | 選項排序 | |
|
|------------------------------------|---------|----------------|----------------|----------------|---------|---------| |
|
| [mistralai/Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501) | box | 56.15 (±0.0172) | 37.48 (±0.0098) | 74.61 (±0.0154) | 3 | 隨機 | |
|
| [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) | box | 15.49 (±0.0104) | 25.68 (±0.0200) | 6.90 (±0.0096) | 3 | 隨機 | |
|
| [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) | pattern | 35.85 (±0.0174) | 32.22 (±0.0023) | 59.33 (±0.0168) | 3 | 隨機 | |
|
| [MediaTek-Research/Llama-Breeze2-3B-Instruct](https://huggingface.co/MediaTek-Research/Llama-Breeze2-3B-Instruct) | pattern | 40.32 (±0.0181) | 38.92 (±0.0193) | 55.37 (±0.0180) | 3 | 隨機 | |
|
| [twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct](https://huggingface.co/twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct) (ours) | box | 46.16 (±0.0198) | 34.92 (±0.0243) | 51.22 (±0.0206) | 3 | 隨機 | |
|
|
|
下表用 lighteval 評測框架 |
|
| 模型 | MATH-500 | GPQA Diamond | |
|
|--------------------------------------------|----------|--------------| |
|
| [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) | 44.40 | 27.78 | |
|
| [twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct](https://huggingface.co/twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct) (ours) | **51.40**| **33.84** | |
|
|
|
|
|
--- |
|
|
|
## 🔧 Tool Calling |
|
|
|
本模型使用 Hermes 格式訓練,並支援平行呼叫(Parallel calling),以下為完整範例流程。 |
|
Tool call 模板已經為大家寫好放進 chat-template 了,Enjoy it! |
|
|
|
### 1️⃣ 啟動 vLLM 後端 |
|
> **⚠️ 注意:需要 vLLM 版本 >= 0.8.3,否則 `enable-reasoning`、`enable-auto-tool-choice` 無法同時開啟** |
|
|
|
```bash |
|
vllm serve twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct \ |
|
--port 8001 \ |
|
--enable-reasoning \ |
|
--reasoning-parser deepseek_r1 \ |
|
--enable-auto-tool-choice \ |
|
--tool-call-parser hermes |
|
``` |
|
|
|
### 2️⃣ 定義工具(Functions) |
|
|
|
```python |
|
def get_weather(location: str, unit: str): |
|
return f"{location}的氣溫是{unit}26度,晴朗無風" |
|
|
|
def search(query: str): |
|
return "川普終於宣布對等關稅政策,針對 18 個經濟體課徵一半的對等關稅,並從 4/5 起對所有進口產品徵收10%的基準關稅!美國將針對被認定為不當貿易行為(不公平貿易) 的國家,於 4/9 起課徵報復型對等關稅 (Discounted Reciprocal Tariff),例如:日本將被課徵 24% 的關稅,歐盟則為 20%,以取代普遍性的 10% 關稅。\n針對中國則開啟新一波 34% 關稅,並疊加於先前已實施的關稅上,這將使中國進口商品的基本關稅稅率達到 54%,而且這尚未包含拜登總統任內或川普第一任期所施加的額外關稅。加拿大與墨西哥則不適用這套對等關稅制度,但川普認為這些國家在芬太尼危機與非法移民問題尚未完全解決,因此計畫對這兩國的大多數進口商品施加 25% 關稅。另外原本針對汽車與多數其他商品的關稅豁免將於 4/2 到期。\n台灣的部分,美國擬向台灣課徵32%的對等關稅,雖然並未針對晶片特別課徵關稅,但仍在記者會中提到台灣搶奪所有的電腦與半導體晶片,最終促成台積電對美國投資計劃額外加碼 1,000 億美元的歷史性投資;歐盟則課徵20%的對等關稅。最後是汽車關稅將於 4/2 起,對所有外國製造的汽車課徵25% 關稅。" |
|
|
|
tools = [ |
|
{ |
|
"type": "function", |
|
"function": { |
|
"name": "get_weather", |
|
"description": "Get the current weather in a given location", |
|
"parameters": { |
|
"type": "object", |
|
"properties": { |
|
"location": {"type": "string", "description": "國家或城市名, e.g., 'Taipei'、'Jaipei'"}, |
|
"unit": {"type": "string", "description": "氣溫單位,亞洲城市使用攝氏;歐美城市使用華氏", "enum": ["celsius", "fahrenheit"]} |
|
}, |
|
"required": ["location", "unit"] |
|
} |
|
} |
|
}, |
|
{ |
|
"type": "function", |
|
"function": { |
|
"name": "search", |
|
"description": "這是一個類似 Google 的搜尋引擎,關於知識、天氣、股票、電影、小說、百科等等問題,如果你不確定答案就搜尋一下。", |
|
"parameters": { |
|
"type": "object", |
|
"properties": { |
|
"query": {"type": "string", "description": "should be a search query, e.g., '2024 南韓 戒嚴'"} |
|
}, |
|
"required": ["query"] |
|
} |
|
} |
|
} |
|
] |
|
``` |
|
|
|
### 3️⃣ 執行工具調用(Tool Calls) |
|
|
|
> **⚠️ 注意:system_prompt 可以不用帶,除非是需要時間基準的工具。** |
|
```python |
|
response = client.chat.completions.create( |
|
model=client.models.list().data[0].id, |
|
messages=[ |
|
{"role": "system", "content": "記住你的知識截止於 2024/12,今天是 2025/4/7"}, |
|
{"role": "user", "content": "台北氣溫如何? 另外,告訴我川普最新關稅政策"}, |
|
], |
|
max_tokens=1500, |
|
temperature=0.6, |
|
top_p=0.95, |
|
tools=tools, |
|
tool_choice="auto" |
|
) |
|
|
|
print(response.choices[0].message.reasoning_content) |
|
print(response.choices[0].message.tool_calls) |
|
``` |
|
|
|
#### 🧠 推理內容輸出(僅顯示部分) |
|
> 好的,我需要幫助這個使用者解決他們的問題。他們問了兩件事:首先,臺北市的天氣情況,以及第二,關於川普最近的關稅政策。 |
|
> 對於第一部分,他們提到了“臺北”,所以應該呼叫 get_weather 函式… |
|
> 接下來是關於川普的新關稅政策… |
|
> 總結一下,我需要分別進行兩次 API 呼叫,每次都有各自正確填寫的參數… |
|
|
|
#### ⚙️ Tool Calls List |
|
|
|
|
|
```json |
|
[ChatCompletionMessageToolCall(id='chatcmpl-tool-35e74420119349999913a10133b84bd3', function=Function(arguments='{"location": "Taipei", "unit": "celsius"}', name='get_weather'), type='function'), ChatCompletionMessageToolCall(id='chatcmpl-tool-7ffdcb98e59f4134a6171defe7f2e31b', function=Function(arguments='{"query": "Donald Trump latest tariffs policy"}', name='search'), type='function')] |
|
``` |
|
|
|
### 4️⃣ 產生最終回答 |
|
|
|
```python |
|
response = client.chat.completions.create( |
|
model=client.models.list().data[0].id, |
|
messages=[ |
|
{"role": "system", "content": "記住你的知識截止於 2024/12,今天是 2025/4/7"}, |
|
{"role": "user", "content": "台北氣溫如何? 另外,告訴我川普最新關稅政策"}, |
|
{ |
|
"role": "assistant", |
|
"content": "", |
|
"tool_calls": [ |
|
{ |
|
"id": response.choices[0].message.tool_calls[0].id, |
|
"type": "function", |
|
"function": { |
|
"name": response.choices[0].message.tool_calls[0].function.name, |
|
"arguments": response.choices[0].message.tool_calls[0].function.arguments |
|
} |
|
}, |
|
{ |
|
"id": response.choices[0].message.tool_calls[1].id, |
|
"type": "function", |
|
"function": { |
|
"name": response.choices[0].message.tool_calls[1].function.name, |
|
"arguments": response.choices[0].message.tool_calls[1].function.arguments |
|
} |
|
} |
|
] |
|
}, |
|
{ |
|
"role": "tool", |
|
"content": search(**json.loads(response.choices[0].message.tool_calls[0].function.arguments)), |
|
"tool_call_id": response.choices[0].message.tool_calls[0].id # tool_call_id 必須要帶,才能正確配對 工具 及 tool_call |
|
}, |
|
{ |
|
"role": "tool", |
|
"content": get_weather(**json.loads(response.choices[0].message.tool_calls[1].function.arguments)), |
|
"tool_call_id": response.choices[0].message.tool_calls[1].id # tool_call_id 必須要帶,才能正確配對 工具 及 tool_call |
|
} |
|
], |
|
max_tokens=1500, |
|
temperature=0.6, |
|
top_p=0.95, |
|
tools=tools, |
|
tool_choice="auto" |
|
) |
|
|
|
print(response.choices[0].message.reasoning_content) |
|
print(response.choices[0].message.content) |
|
``` |
|
|
|
#### 🧠 推理內容輸出(僅顯示部分) |
|
> 首先,我需要處理使用者的查詢,他們要求了解臺北市的當下氣溫以及川普最近的關稅政策… |
|
> 在呼叫了 `get_weather` 後得到了臺北市氣溫為 26 度(攝氏)… |
|
> 接著,使用 `search` 搜尋「川普最新關稅政策 2025」… |
|
> 整合後,我提供如下摘要: |
|
|
|
#### 📋 最終輸出內容 |
|
|
|
```text |
|
以下是您請求的資訊: |
|
|
|
**臺北市氣溫** |
|
- 目前的氣溫為 **26°C**(攝氏) |
|
- 天候狀況:晴朗無風 |
|
|
|
**川普最新關稅政策概述** |
|
1. **對等關稅政策** |
|
- 對 18 個經濟體課徵 50% 的對等關稅 |
|
- 自 4 月 5 日起,所有進口產品全面徵收 10% 基本關稅 |
|
|
|
2. **報復型對等關稅** |
|
- 日本 24%、歐盟 20% |
|
|
|
3. **對中國的高額關稅** |
|
- 增加至 54%(原有關稅 + 新增 34%) |
|
|
|
4. **特殊案例** |
|
- 加拿大與墨西哥不適用,但其他商品課徵 25% |
|
- 汽車與部分商品的免稅即將到期 |
|
|
|
5. **對台灣的影響** |
|
- 美國計畫對台灣課徵 32% 關稅,但晶片暫無額外課稅 |
|
|
|
6. **全球視角** |
|
- 歐盟與日本關稅比例相對較高 |
|
``` |
|
|
|
|
|
## Citation |
|
|
|
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
|
```yaml |
|
@misc{twinkleai2025llama3.2f1, |
|
title = {Llama-3.2-3B-F1-Reasoning-Instruct: A Traditional Chinese Instruction-Tuned Reasoning Language Model for Taiwan}, |
|
author = {Huang, Liang Hsun and Chen, Min Yi and Lin, Wen Bin and Chuang, Chao Chun and Sung, Dave}, |
|
year = {2025}, |
|
howpublished = {\url{https://huggingface.co/twinkle-ai/Llama-3.2-3B-F1-Instruct}}, |
|
note = {Twinkle AI and APMIC. All authors contributed equally.} |
|
} |
|
``` |
|
|
|
## Acknowledge |
|
- 特此感謝[國家高速網路與計算中心](https://www.nchc.org.tw/)的指導與 [APMIC](https://www.apmic.ai/) 的算力支援,才得以讓本專案訓利完成。 |
|
- 特此致謝黃啟聖老師、許武龍(哈爸)、臺北市立第一女子高級中學物理科陳姿燁老師、[奈視科技](https://nanoseex.com/) CTO Howard、[AIPLUX Technology](https://aiplux.com/)、郭家嘉老師以及所有在資料集製作過程中提供寶貴協助的夥伴。 |
|
|
|
## Model Card Authors |
|
|
|
[Twinkle AI](https://huggingface.co/twinkle-ai) |
|
|
|
## Model Card Contact |
|
|
|
[Twinkle AI](https://huggingface.co/twinkle-ai) |