Upload README.md with huggingface_hub
Browse files
README.md
ADDED
@@ -0,0 +1,581 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: llama3.2
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
- zh
|
6 |
+
base_model:
|
7 |
+
- meta-llama/Llama-3.2-3B
|
8 |
+
- lianghsun/Llama-3.2-3B-F1-Base
|
9 |
+
library_name: transformers
|
10 |
+
tags:
|
11 |
+
- Taiwan
|
12 |
+
- R.O.C
|
13 |
+
- zhtw
|
14 |
+
- SLM
|
15 |
+
- Llama-32
|
16 |
+
datasets:
|
17 |
+
- lianghsun/tw-reasoning-instruct
|
18 |
+
- minyichen/tw-instruct-R1-200k
|
19 |
+
- minyichen/tw_mm_R1
|
20 |
+
model-index:
|
21 |
+
- name: Llama-3.2-3B-F1-Reasoning-Instruct
|
22 |
+
results:
|
23 |
+
- task:
|
24 |
+
type: question-answering
|
25 |
+
name: Single Choice Question
|
26 |
+
dataset:
|
27 |
+
type: ikala/tmmluplus
|
28 |
+
name: tmmlu+
|
29 |
+
config: all
|
30 |
+
split: test
|
31 |
+
revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c
|
32 |
+
metrics:
|
33 |
+
- name: single choice
|
34 |
+
type: accuracy
|
35 |
+
value: 46.16
|
36 |
+
- task:
|
37 |
+
type: question-answering
|
38 |
+
name: Single Choice Question
|
39 |
+
dataset:
|
40 |
+
type: cais/mmlu
|
41 |
+
name: mmlu
|
42 |
+
config: all
|
43 |
+
split: test
|
44 |
+
revision: c30699e
|
45 |
+
metrics:
|
46 |
+
- name: single choice
|
47 |
+
type: accuracy
|
48 |
+
value: 51.22
|
49 |
+
- task:
|
50 |
+
type: question-answering
|
51 |
+
name: Single Choice Question
|
52 |
+
dataset:
|
53 |
+
type: lianghsun/tw-legal-benchmark-v1
|
54 |
+
name: tw-legal-benchmark-v1
|
55 |
+
config: all
|
56 |
+
split: test
|
57 |
+
revision: 66c3a5f
|
58 |
+
metrics:
|
59 |
+
- name: single choice
|
60 |
+
type: accuracy
|
61 |
+
value: 34.92
|
62 |
+
metrics:
|
63 |
+
- accuracy
|
64 |
+
---
|
65 |
+
|
66 |
+
# <span style="color: #7FFF7F;">Llama-3.2-3B-F1-Reasoning-Instruct GGUF Models</span>
|
67 |
+
|
68 |
+
|
69 |
+
## <span style="color: #7F7FFF;">Model Generation Details</span>
|
70 |
+
|
71 |
+
This model was generated using [llama.cpp](https://github.com/ggerganov/llama.cpp) at commit [`064cc596`](https://github.com/ggerganov/llama.cpp/commit/064cc596ac44308dc326a17c9e3163c34a6f29d1).
|
72 |
+
|
73 |
+
|
74 |
+
|
75 |
+
|
76 |
+
## <span style="color: #7FFF7F;">Ultra-Low-Bit Quantization with IQ-DynamicGate (1-2 bit)</span>
|
77 |
+
|
78 |
+
Our latest quantization method introduces **precision-adaptive quantization** for ultra-low-bit models (1-2 bit), with benchmark-proven improvements on **Llama-3-8B**. This approach uses layer-specific strategies to preserve accuracy while maintaining extreme memory efficiency.
|
79 |
+
|
80 |
+
### **Benchmark Context**
|
81 |
+
All tests conducted on **Llama-3-8B-Instruct** using:
|
82 |
+
- Standard perplexity evaluation pipeline
|
83 |
+
- 2048-token context window
|
84 |
+
- Same prompt set across all quantizations
|
85 |
+
|
86 |
+
### **Method**
|
87 |
+
- **Dynamic Precision Allocation**:
|
88 |
+
- First/Last 25% of layers → IQ4_XS (selected layers)
|
89 |
+
- Middle 50% → IQ2_XXS/IQ3_S (increase efficiency)
|
90 |
+
- **Critical Component Protection**:
|
91 |
+
- Embeddings/output layers use Q5_K
|
92 |
+
- Reduces error propagation by 38% vs standard 1-2bit
|
93 |
+
|
94 |
+
### **Quantization Performance Comparison (Llama-3-8B)**
|
95 |
+
|
96 |
+
| Quantization | Standard PPL | DynamicGate PPL | Δ PPL | Std Size | DG Size | Δ Size | Std Speed | DG Speed |
|
97 |
+
|--------------|--------------|------------------|---------|----------|---------|--------|-----------|----------|
|
98 |
+
| IQ2_XXS | 11.30 | 9.84 | -12.9% | 2.5G | 2.6G | +0.1G | 234s | 246s |
|
99 |
+
| IQ2_XS | 11.72 | 11.63 | -0.8% | 2.7G | 2.8G | +0.1G | 242s | 246s |
|
100 |
+
| IQ2_S | 14.31 | 9.02 | -36.9% | 2.7G | 2.9G | +0.2G | 238s | 244s |
|
101 |
+
| IQ1_M | 27.46 | 15.41 | -43.9% | 2.2G | 2.5G | +0.3G | 206s | 212s |
|
102 |
+
| IQ1_S | 53.07 | 32.00 | -39.7% | 2.1G | 2.4G | +0.3G | 184s | 209s |
|
103 |
+
|
104 |
+
**Key**:
|
105 |
+
- PPL = Perplexity (lower is better)
|
106 |
+
- Δ PPL = Percentage change from standard to DynamicGate
|
107 |
+
- Speed = Inference time (CPU avx2, 2048 token context)
|
108 |
+
- Size differences reflect mixed quantization overhead
|
109 |
+
|
110 |
+
**Key Improvements:**
|
111 |
+
- 🔥 **IQ1_M** shows massive 43.9% perplexity reduction (27.46 → 15.41)
|
112 |
+
- 🚀 **IQ2_S** cuts perplexity by 36.9% while adding only 0.2GB
|
113 |
+
- ⚡ **IQ1_S** maintains 39.7% better accuracy despite 1-bit quantization
|
114 |
+
|
115 |
+
**Tradeoffs:**
|
116 |
+
- All variants have modest size increases (0.1-0.3GB)
|
117 |
+
- Inference speeds remain comparable (<5% difference)
|
118 |
+
|
119 |
+
|
120 |
+
### **When to Use These Models**
|
121 |
+
📌 **Fitting models into GPU VRAM**
|
122 |
+
|
123 |
+
✔ **Memory-constrained deployments**
|
124 |
+
|
125 |
+
✔ **Cpu and Edge Devices** where 1-2bit errors can be tolerated
|
126 |
+
|
127 |
+
✔ **Research** into ultra-low-bit quantization
|
128 |
+
|
129 |
+
|
130 |
+
|
131 |
+
## **Choosing the Right Model Format**
|
132 |
+
|
133 |
+
Selecting the correct model format depends on your **hardware capabilities** and **memory constraints**.
|
134 |
+
|
135 |
+
### **BF16 (Brain Float 16) – Use if BF16 acceleration is available**
|
136 |
+
- A 16-bit floating-point format designed for **faster computation** while retaining good precision.
|
137 |
+
- Provides **similar dynamic range** as FP32 but with **lower memory usage**.
|
138 |
+
- Recommended if your hardware supports **BF16 acceleration** (check your device's specs).
|
139 |
+
- Ideal for **high-performance inference** with **reduced memory footprint** compared to FP32.
|
140 |
+
|
141 |
+
📌 **Use BF16 if:**
|
142 |
+
✔ Your hardware has native **BF16 support** (e.g., newer GPUs, TPUs).
|
143 |
+
✔ You want **higher precision** while saving memory.
|
144 |
+
✔ You plan to **requantize** the model into another format.
|
145 |
+
|
146 |
+
📌 **Avoid BF16 if:**
|
147 |
+
❌ Your hardware does **not** support BF16 (it may fall back to FP32 and run slower).
|
148 |
+
❌ You need compatibility with older devices that lack BF16 optimization.
|
149 |
+
|
150 |
+
---
|
151 |
+
|
152 |
+
### **F16 (Float 16) – More widely supported than BF16**
|
153 |
+
- A 16-bit floating-point **high precision** but with less of range of values than BF16.
|
154 |
+
- Works on most devices with **FP16 acceleration support** (including many GPUs and some CPUs).
|
155 |
+
- Slightly lower numerical precision than BF16 but generally sufficient for inference.
|
156 |
+
|
157 |
+
📌 **Use F16 if:**
|
158 |
+
✔ Your hardware supports **FP16** but **not BF16**.
|
159 |
+
✔ You need a **balance between speed, memory usage, and accuracy**.
|
160 |
+
✔ You are running on a **GPU** or another device optimized for FP16 computations.
|
161 |
+
|
162 |
+
📌 **Avoid F16 if:**
|
163 |
+
❌ Your device lacks **native FP16 support** (it may run slower than expected).
|
164 |
+
❌ You have memory limitations.
|
165 |
+
|
166 |
+
---
|
167 |
+
|
168 |
+
### **Quantized Models (Q4_K, Q6_K, Q8, etc.) – For CPU & Low-VRAM Inference**
|
169 |
+
Quantization reduces model size and memory usage while maintaining as much accuracy as possible.
|
170 |
+
- **Lower-bit models (Q4_K)** → **Best for minimal memory usage**, may have lower precision.
|
171 |
+
- **Higher-bit models (Q6_K, Q8_0)** → **Better accuracy**, requires more memory.
|
172 |
+
|
173 |
+
📌 **Use Quantized Models if:**
|
174 |
+
✔ You are running inference on a **CPU** and need an optimized model.
|
175 |
+
✔ Your device has **low VRAM** and cannot load full-precision models.
|
176 |
+
✔ You want to reduce **memory footprint** while keeping reasonable accuracy.
|
177 |
+
|
178 |
+
📌 **Avoid Quantized Models if:**
|
179 |
+
❌ You need **maximum accuracy** (full-precision models are better for this).
|
180 |
+
❌ Your hardware has enough VRAM for higher-precision formats (BF16/F16).
|
181 |
+
|
182 |
+
---
|
183 |
+
|
184 |
+
### **Very Low-Bit Quantization (IQ3_XS, IQ3_S, IQ3_M, Q4_K, Q4_0)**
|
185 |
+
These models are optimized for **extreme memory efficiency**, making them ideal for **low-power devices** or **large-scale deployments** where memory is a critical constraint.
|
186 |
+
|
187 |
+
- **IQ3_XS**: Ultra-low-bit quantization (3-bit) with **extreme memory efficiency**.
|
188 |
+
- **Use case**: Best for **ultra-low-memory devices** where even Q4_K is too large.
|
189 |
+
- **Trade-off**: Lower accuracy compared to higher-bit quantizations.
|
190 |
+
|
191 |
+
- **IQ3_S**: Small block size for **maximum memory efficiency**.
|
192 |
+
- **Use case**: Best for **low-memory devices** where **IQ3_XS** is too aggressive.
|
193 |
+
|
194 |
+
- **IQ3_M**: Medium block size for better accuracy than **IQ3_S**.
|
195 |
+
- **Use case**: Suitable for **low-memory devices** where **IQ3_S** is too limiting.
|
196 |
+
|
197 |
+
- **Q4_K**: 4-bit quantization with **block-wise optimization** for better accuracy.
|
198 |
+
- **Use case**: Best for **low-memory devices** where **Q6_K** is too large.
|
199 |
+
|
200 |
+
- **Q4_0**: Pure 4-bit quantization, optimized for **ARM devices**.
|
201 |
+
- **Use case**: Best for **ARM-based devices** or **low-memory environments**.
|
202 |
+
|
203 |
+
---
|
204 |
+
|
205 |
+
### **Summary Table: Model Format Selection**
|
206 |
+
|
207 |
+
| Model Format | Precision | Memory Usage | Device Requirements | Best Use Case |
|
208 |
+
|--------------|------------|---------------|----------------------|---------------|
|
209 |
+
| **BF16** | Highest | High | BF16-supported GPU/CPUs | High-speed inference with reduced memory |
|
210 |
+
| **F16** | High | High | FP16-supported devices | GPU inference when BF16 isn't available |
|
211 |
+
| **Q4_K** | Medium Low | Low | CPU or Low-VRAM devices | Best for memory-constrained environments |
|
212 |
+
| **Q6_K** | Medium | Moderate | CPU with more memory | Better accuracy while still being quantized |
|
213 |
+
| **Q8_0** | High | Moderate | CPU or GPU with enough VRAM | Best accuracy among quantized models |
|
214 |
+
| **IQ3_XS** | Very Low | Very Low | Ultra-low-memory devices | Extreme memory efficiency and low accuracy |
|
215 |
+
| **Q4_0** | Low | Low | ARM or low-memory devices | llama.cpp can optimize for ARM devices |
|
216 |
+
|
217 |
+
---
|
218 |
+
|
219 |
+
## **Included Files & Details**
|
220 |
+
|
221 |
+
### `Llama-3.2-3B-F1-Reasoning-Instruct-bf16.gguf`
|
222 |
+
- Model weights preserved in **BF16**.
|
223 |
+
- Use this if you want to **requantize** the model into a different format.
|
224 |
+
- Best if your device supports **BF16 acceleration**.
|
225 |
+
|
226 |
+
### `Llama-3.2-3B-F1-Reasoning-Instruct-f16.gguf`
|
227 |
+
- Model weights stored in **F16**.
|
228 |
+
- Use if your device supports **FP16**, especially if BF16 is not available.
|
229 |
+
|
230 |
+
### `Llama-3.2-3B-F1-Reasoning-Instruct-bf16-q8_0.gguf`
|
231 |
+
- **Output & embeddings** remain in **BF16**.
|
232 |
+
- All other layers quantized to **Q8_0**.
|
233 |
+
- Use if your device supports **BF16** and you want a quantized version.
|
234 |
+
|
235 |
+
### `Llama-3.2-3B-F1-Reasoning-Instruct-f16-q8_0.gguf`
|
236 |
+
- **Output & embeddings** remain in **F16**.
|
237 |
+
- All other layers quantized to **Q8_0**.
|
238 |
+
|
239 |
+
### `Llama-3.2-3B-F1-Reasoning-Instruct-q4_k.gguf`
|
240 |
+
- **Output & embeddings** quantized to **Q8_0**.
|
241 |
+
- All other layers quantized to **Q4_K**.
|
242 |
+
- Good for **CPU inference** with limited memory.
|
243 |
+
|
244 |
+
### `Llama-3.2-3B-F1-Reasoning-Instruct-q4_k_s.gguf`
|
245 |
+
- Smallest **Q4_K** variant, using less memory at the cost of accuracy.
|
246 |
+
- Best for **very low-memory setups**.
|
247 |
+
|
248 |
+
### `Llama-3.2-3B-F1-Reasoning-Instruct-q6_k.gguf`
|
249 |
+
- **Output & embeddings** quantized to **Q8_0**.
|
250 |
+
- All other layers quantized to **Q6_K** .
|
251 |
+
|
252 |
+
### `Llama-3.2-3B-F1-Reasoning-Instruct-q8_0.gguf`
|
253 |
+
- Fully **Q8** quantized model for better accuracy.
|
254 |
+
- Requires **more memory** but offers higher precision.
|
255 |
+
|
256 |
+
### `Llama-3.2-3B-F1-Reasoning-Instruct-iq3_xs.gguf`
|
257 |
+
- **IQ3_XS** quantization, optimized for **extreme memory efficiency**.
|
258 |
+
- Best for **ultra-low-memory devices**.
|
259 |
+
|
260 |
+
### `Llama-3.2-3B-F1-Reasoning-Instruct-iq3_m.gguf`
|
261 |
+
- **IQ3_M** quantization, offering a **medium block size** for better accuracy.
|
262 |
+
- Suitable for **low-memory devices**.
|
263 |
+
|
264 |
+
### `Llama-3.2-3B-F1-Reasoning-Instruct-q4_0.gguf`
|
265 |
+
- Pure **Q4_0** quantization, optimized for **ARM devices**.
|
266 |
+
- Best for **low-memory environments**.
|
267 |
+
- Prefer IQ4_NL for better accuracy.
|
268 |
+
|
269 |
+
# <span id="testllm" style="color: #7F7FFF;">🚀 If you find these models useful</span>
|
270 |
+
❤ **Please click "Like" if you find this useful!**
|
271 |
+
Help me test my **AI-Powered Network Monitor Assistant** with **quantum-ready security checks**:
|
272 |
+
👉 [Free Network Monitor](https://readyforquantum.com/dashboard/?assistant=open)
|
273 |
+
|
274 |
+
💬 **How to test**:
|
275 |
+
Choose an **AI assistant type**:
|
276 |
+
- `TurboLLM` (GPT-4o-mini)
|
277 |
+
- `HugLLM` (Hugginface Open-source)
|
278 |
+
- `TestLLM` (Experimental CPU-only)
|
279 |
+
|
280 |
+
### **What I’m Testing**
|
281 |
+
I’m pushing the limits of **small open-source models for AI network monitoring**, specifically:
|
282 |
+
- **Function calling** against live network services
|
283 |
+
- **How small can a model go** while still handling:
|
284 |
+
- Automated **Nmap scans**
|
285 |
+
- **Quantum-readiness checks**
|
286 |
+
- **Network Monitoring tasks**
|
287 |
+
|
288 |
+
🟡 **TestLLM** – Current experimental model (llama.cpp on 2 CPU threads):
|
289 |
+
- ✅ **Zero-configuration setup**
|
290 |
+
- ⏳ 30s load time (slow inference but **no API costs**)
|
291 |
+
- 🔧 **Help wanted!** If you’re into **edge-device AI**, let’s collaborate!
|
292 |
+
|
293 |
+
### **Other Assistants**
|
294 |
+
🟢 **TurboLLM** – Uses **gpt-4o-mini** for:
|
295 |
+
- **Create custom cmd processors to run .net code on Free Network Monitor Agents**
|
296 |
+
- **Real-time network diagnostics and monitoring**
|
297 |
+
- **Security Audits**
|
298 |
+
- **Penetration testing** (Nmap/Metasploit)
|
299 |
+
- 🔑 Get more tokens by logging in or [downloading our Free Network Monitor Agent with integrated AI Assistant](https://readyforquantum.com/download)
|
300 |
+
|
301 |
+
🔵 **HugLLM** – Latest Open-source models:
|
302 |
+
- 🌐 Runs on Hugging Face Inference API
|
303 |
+
|
304 |
+
### 💡 **Example commands to you could test**:
|
305 |
+
1. `"Give me info on my websites SSL certificate"`
|
306 |
+
2. `"Check if my server is using quantum safe encyption for communication"`
|
307 |
+
3. `"Run a comprehensive security audit on my server"`
|
308 |
+
4. '"Create a cmd processor to .. (what ever you want)" Note you need to install a Free Network Monitor Agent to run the .net code from. This is a very flexible and powerful feature. Use with caution!
|
309 |
+
|
310 |
+
|
311 |
+
|
312 |
+
# Model Card for Llama-3.2-3B-F1-Reasoning-Instruct (a.k.a __Formosa-1-Reasoning__ or __F1-Reasoning__)
|
313 |
+
|
314 |
+
<div align="center" style="line-height: 1;">
|
315 |
+
<a href="https://discord.gg/Cx737yw4ed" target="_blank" style="margin: 2px;">
|
316 |
+
<img alt="Discord" src="https://img.shields.io/badge/Discord-Twinkle%20AI-7289da?logo=discord&logoColor=white&color=7289da" style="display: inline-block; vertical-align: middle;"/>
|
317 |
+
</a>
|
318 |
+
<a href="https://huggingface.co/twinkle-ai" target="_blank" style="margin: 2px;">
|
319 |
+
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Twinkle%20AI-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
320 |
+
</a>
|
321 |
+
</div>
|
322 |
+
|
323 |
+
<div align="center" style="line-height: 1;">
|
324 |
+
<a href="https://huggingface.co/meta-llama/Llama-3.2-1B/blob/main/LICENSE.txt" style="margin: 2px;">
|
325 |
+
<img alt="License" src="https://img.shields.io/badge/License-llama3.2-f5de53?&color=0081fb" style="display: inline-block; vertical-align: middle;"/>
|
326 |
+
</a>
|
327 |
+
</div>
|
328 |
+
|
329 |
+

|
330 |
+
|
331 |
+
<!-- Provide a quick summary of what the model is/does. -->
|
332 |
+
**Llama-3.2-3B-F1-Reasoning-Instruct**(a.k.a **Formosa-1-Reasoning** or **F1-Reasoning**) 是由 **[Twinkle AI](https://huggingface.co/twinkle-ai)** 與 **[APMIC](https://www.apmic.ai/)** 合作開發,並在[國家高速網路與計算中心](https://www.nchc.org.tw/)技術指導之下,針對中華民國台灣語境與任務需求所微調之繁體中文語言模型,涵蓋法律、教育、生活應用等多元場景,並以高指令跟隨能力為目標進行強化。
|
333 |
+
|
334 |
+
## Model Details
|
335 |
+
|
336 |
+
### Model Description
|
337 |
+
|
338 |
+
<!-- Provide a longer summary of what this model is. -->
|
339 |
+
|
340 |
+
- **Developed by:** [Liang Hsun Huang](https://huggingface.co/lianghsun)、[Min Yi Chen](https://huggingface.co/minyichen)、[Wen Bin Lin](https://huggingface.co/tedslin)、[Chao Chun Chuang](https://huggingface.co/c00cjz00) & [Dave Sung](https://huggingface.co/k1dave6412) (All authors have contributed equally to this work.)
|
341 |
+
- **Funded by:** [APMIC](https://www.apmic.ai/)
|
342 |
+
- **Model type:** LlamaForCausalLM
|
343 |
+
- **Language(s) (NLP):** Tranditional Chinese & English
|
344 |
+
- **License:** [llama3.2](https://huggingface.co/meta-llama/Llama-3.2-1B/blob/main/LICENSE.txt)
|
345 |
+
|
346 |
+
### Model Sources
|
347 |
+
<!-- Provide the basic links for the model. -->
|
348 |
+
|
349 |
+
- **Repository:** [twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct](https://huggingface.co/twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct)
|
350 |
+
- **Paper:** (TBA)
|
351 |
+
- **Demo:** [Playground](https://3b02.coolify.apmic.ai/)
|
352 |
+
|
353 |
+
## Evaluation
|
354 |
+
|
355 |
+
### Results
|
356 |
+
|
357 |
+
下表採用 [🌟 Twinkle Eval](https://github.com/ai-twinkle/Eval) 評測框架
|
358 |
+
| 模型 | 評測模式 | TMMLU+(%) | 台灣法律(%) | MMLU(%) | 測試次數 | 選項排序 |
|
359 |
+
|------------------------------------|---------|----------------|----------------|----------------|---------|---------|
|
360 |
+
| [mistralai/Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501) | box | 56.15 (±0.0172) | 37.48 (±0.0098) | 74.61 (±0.0154) | 3 | 隨機 |
|
361 |
+
| [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) | box | 15.49 (±0.0104) | 25.68 (±0.0200) | 6.90 (±0.0096) | 3 | 隨機 |
|
362 |
+
| [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) | pattern | 35.85 (±0.0174) | 32.22 (±0.0023) | 59.33 (±0.0168) | 3 | 隨機 |
|
363 |
+
| [MediaTek-Research/Llama-Breeze2-3B-Instruct](https://huggingface.co/MediaTek-Research/Llama-Breeze2-3B-Instruct) | pattern | 40.32 (±0.0181) | 38.92 (±0.0193) | 55.37 (±0.0180) | 3 | 隨機 |
|
364 |
+
| [twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct](https://huggingface.co/twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct) (ours) | box | 46.16 (±0.0198) | 34.92 (±0.0243) | 51.22 (±0.0206) | 3 | 隨機 |
|
365 |
+
|
366 |
+
下表用 lighteval 評測框架
|
367 |
+
| 模型 | MATH-500 | GPQA Diamond |
|
368 |
+
|--------------------------------------------|----------|--------------|
|
369 |
+
| [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) | 44.40 | 27.78 |
|
370 |
+
| [twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct](https://huggingface.co/twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct) (ours) | **51.40**| **33.84** |
|
371 |
+
|
372 |
+
|
373 |
+
---
|
374 |
+
|
375 |
+
## 🔧 Tool Calling
|
376 |
+
|
377 |
+
本模型使用 Hermes 格式訓練,並支援平行呼叫(Parallel calling),以下為完整範例流程。
|
378 |
+
Tool call 模板已經為大家寫好放進 chat-template 了,Enjoy it!
|
379 |
+
|
380 |
+
### 1️⃣ 啟動 vLLM 後端
|
381 |
+
> **⚠️ 注意:需要 vLLM 版本 >= 0.8.3,否則 `enable-reasoning`、`enable-auto-tool-choice` 無法同時開啟**
|
382 |
+
|
383 |
+
```bash
|
384 |
+
vllm serve twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct \
|
385 |
+
--port 8001 \
|
386 |
+
--enable-reasoning \
|
387 |
+
--reasoning-parser deepseek_r1 \
|
388 |
+
--enable-auto-tool-choice \
|
389 |
+
--tool-call-parser hermes
|
390 |
+
```
|
391 |
+
|
392 |
+
### 2️⃣ 定義工具(Functions)
|
393 |
+
|
394 |
+
```python
|
395 |
+
def get_weather(location: str, unit: str):
|
396 |
+
return f"{location}的氣溫是{unit}26度,晴朗無風"
|
397 |
+
|
398 |
+
def search(query: str):
|
399 |
+
return "川普終於宣布對等關稅政策,針對 18 個經濟體課徵一半的對等關稅,並從 4/5 起對所有進口產品徵收10%的基準關稅!美國將針對被認定為不當貿易行為(不公平貿易) 的國家,於 4/9 起課徵報復型對等關稅 (Discounted Reciprocal Tariff),例如:日本將被課徵 24% 的關稅,歐盟則為 20%,以取代普遍性的 10% 關稅。\n針對中國則開啟新一波 34% 關稅,並疊加於先前已實施的關稅上,這將使中國進口商品的基本關稅稅率達到 54%,而且這尚未包含拜登總統任內或川普第一任期所施加的額外關稅。加拿大與墨西哥則不適用這套對等關稅制度,但川普認為這些國家在芬太尼危機與非法移民問題尚未完全解決,因此計畫對這兩國的大多數進口商品施加 25% 關稅。另外原本針對汽車與多數其他商品的關稅豁免將於 4/2 到期。\n台灣的部分,美國擬向台灣課徵32%的對等關稅,雖然並未針對晶片特別課徵關稅,但仍在記者會中提到台灣搶奪所有的電腦與半導體晶片,最終促成台積電對美國投資計劃額外加碼 1,000 億美元的歷史性投資;歐盟則課徵20%的對等關稅。最後是汽車關稅將於 4/2 起,對所有外國製造的汽車課徵25% 關稅。"
|
400 |
+
|
401 |
+
tools = [
|
402 |
+
{
|
403 |
+
"type": "function",
|
404 |
+
"function": {
|
405 |
+
"name": "get_weather",
|
406 |
+
"description": "Get the current weather in a given location",
|
407 |
+
"parameters": {
|
408 |
+
"type": "object",
|
409 |
+
"properties": {
|
410 |
+
"location": {"type": "string", "description": "國家或城市名, e.g., 'Taipei'、'Jaipei'"},
|
411 |
+
"unit": {"type": "string", "description": "氣溫單位,亞洲城市使用攝氏;歐美城市使用華氏", "enum": ["celsius", "fahrenheit"]}
|
412 |
+
},
|
413 |
+
"required": ["location", "unit"]
|
414 |
+
}
|
415 |
+
}
|
416 |
+
},
|
417 |
+
{
|
418 |
+
"type": "function",
|
419 |
+
"function": {
|
420 |
+
"name": "search",
|
421 |
+
"description": "這是一個類似 Google 的搜尋引擎,關於知識、天氣、股票、電影、小說、百科等等問題,如果你不確定答案就搜尋一下。",
|
422 |
+
"parameters": {
|
423 |
+
"type": "object",
|
424 |
+
"properties": {
|
425 |
+
"query": {"type": "string", "description": "should be a search query, e.g., '2024 南韓 戒嚴'"}
|
426 |
+
},
|
427 |
+
"required": ["query"]
|
428 |
+
}
|
429 |
+
}
|
430 |
+
}
|
431 |
+
]
|
432 |
+
```
|
433 |
+
|
434 |
+
### 3️⃣ 執行工具調用(Tool Calls)
|
435 |
+
|
436 |
+
> **⚠️ 注意:system_prompt 可以不用帶,除非是需要時間基準的工具。**
|
437 |
+
```python
|
438 |
+
response = client.chat.completions.create(
|
439 |
+
model=client.models.list().data[0].id,
|
440 |
+
messages=[
|
441 |
+
{"role": "system", "content": "記住你的知識截止於 2024/12,今天是 2025/4/7"},
|
442 |
+
{"role": "user", "content": "台北氣溫如何? 另外,告訴我川普最新關稅政策"},
|
443 |
+
],
|
444 |
+
max_tokens=1500,
|
445 |
+
temperature=0.6,
|
446 |
+
top_p=0.95,
|
447 |
+
tools=tools,
|
448 |
+
tool_choice="auto"
|
449 |
+
)
|
450 |
+
|
451 |
+
print(response.choices[0].message.reasoning_content)
|
452 |
+
print(response.choices[0].message.tool_calls)
|
453 |
+
```
|
454 |
+
|
455 |
+
#### 🧠 推理內容輸出(僅顯示部分)
|
456 |
+
> 好的,我需要幫助這個使用者解決他們的問題。他們問了兩件事:首先,臺北市的天氣情況,以及第二,關於川普最近的關稅政策。
|
457 |
+
> 對於第一部分,他們提到了“臺北”,所以應該呼叫 get_weather 函式…
|
458 |
+
> 接下來是關於川普的新關稅政策…
|
459 |
+
> 總結一下,我需要分別進行兩次 API 呼叫,每次都有各自正確填寫的參數…
|
460 |
+
|
461 |
+
#### ⚙️ Tool Calls List
|
462 |
+
|
463 |
+
|
464 |
+
```json
|
465 |
+
[ChatCompletionMessageToolCall(id='chatcmpl-tool-35e74420119349999913a10133b84bd3', function=Function(arguments='{"location": "Taipei", "unit": "celsius"}', name='get_weather'), type='function'), ChatCompletionMessageToolCall(id='chatcmpl-tool-7ffdcb98e59f4134a6171defe7f2e31b', function=Function(arguments='{"query": "Donald Trump latest tariffs policy"}', name='search'), type='function')]
|
466 |
+
```
|
467 |
+
|
468 |
+
### 4️⃣ 產生最終回答
|
469 |
+
|
470 |
+
```python
|
471 |
+
response = client.chat.completions.create(
|
472 |
+
model=client.models.list().data[0].id,
|
473 |
+
messages=[
|
474 |
+
{"role": "system", "content": "記住你的知識截止於 2024/12,今天是 2025/4/7"},
|
475 |
+
{"role": "user", "content": "台北氣溫如何? 另外,告訴我川普最新關稅政策"},
|
476 |
+
{
|
477 |
+
"role": "assistant",
|
478 |
+
"content": "",
|
479 |
+
"tool_calls": [
|
480 |
+
{
|
481 |
+
"id": response.choices[0].message.tool_calls[0].id,
|
482 |
+
"type": "function",
|
483 |
+
"function": {
|
484 |
+
"name": response.choices[0].message.tool_calls[0].function.name,
|
485 |
+
"arguments": response.choices[0].message.tool_calls[0].function.arguments
|
486 |
+
}
|
487 |
+
},
|
488 |
+
{
|
489 |
+
"id": response.choices[0].message.tool_calls[1].id,
|
490 |
+
"type": "function",
|
491 |
+
"function": {
|
492 |
+
"name": response.choices[0].message.tool_calls[1].function.name,
|
493 |
+
"arguments": response.choices[0].message.tool_calls[1].function.arguments
|
494 |
+
}
|
495 |
+
}
|
496 |
+
]
|
497 |
+
},
|
498 |
+
{
|
499 |
+
"role": "tool",
|
500 |
+
"content": search(**json.loads(response.choices[0].message.tool_calls[0].function.arguments)),
|
501 |
+
"tool_call_id": response.choices[0].message.tool_calls[0].id # tool_call_id 必須要帶,才能正確配對 工具 及 tool_call
|
502 |
+
},
|
503 |
+
{
|
504 |
+
"role": "tool",
|
505 |
+
"content": get_weather(**json.loads(response.choices[0].message.tool_calls[1].function.arguments)),
|
506 |
+
"tool_call_id": response.choices[0].message.tool_calls[1].id # tool_call_id 必須要帶,才能正確配對 工具 及 tool_call
|
507 |
+
}
|
508 |
+
],
|
509 |
+
max_tokens=1500,
|
510 |
+
temperature=0.6,
|
511 |
+
top_p=0.95,
|
512 |
+
tools=tools,
|
513 |
+
tool_choice="auto"
|
514 |
+
)
|
515 |
+
|
516 |
+
print(response.choices[0].message.reasoning_content)
|
517 |
+
print(response.choices[0].message.content)
|
518 |
+
```
|
519 |
+
|
520 |
+
#### 🧠 推理內容輸出(僅顯示部分)
|
521 |
+
> 首先,我需要處理使用者的查詢,他們要求了解臺北市的當下氣溫以及川普最近的關稅政策…
|
522 |
+
> 在呼叫了 `get_weather` 後得到了臺北市氣溫為 26 度(攝氏)…
|
523 |
+
> 接著,使用 `search` 搜尋「川普最新關稅政策 2025」…
|
524 |
+
> 整合後,我提供如下摘要:
|
525 |
+
|
526 |
+
#### 📋 最終輸出內容
|
527 |
+
|
528 |
+
```text
|
529 |
+
以下是您請求的資訊:
|
530 |
+
|
531 |
+
**臺北市氣溫**
|
532 |
+
- 目前的氣溫為 **26°C**(攝氏)
|
533 |
+
- 天候狀況:晴朗無風
|
534 |
+
|
535 |
+
**川普最新關稅政策概述**
|
536 |
+
1. **對等關稅政策**
|
537 |
+
- 對 18 個經濟體課徵 50% 的對等關稅
|
538 |
+
- 自 4 月 5 日起,所有進口產品全面徵收 10% 基本關稅
|
539 |
+
|
540 |
+
2. **報復型對等關稅**
|
541 |
+
- 日本 24%、歐盟 20%
|
542 |
+
|
543 |
+
3. **對中國的高額關稅**
|
544 |
+
- 增加至 54%(原有關稅 + 新增 34%)
|
545 |
+
|
546 |
+
4. **特殊案例**
|
547 |
+
- 加拿大與墨西哥不適用,但其他商品課徵 25%
|
548 |
+
- 汽車與部分商品的免稅即將到期
|
549 |
+
|
550 |
+
5. **對台灣的影響**
|
551 |
+
- 美國計畫對台灣課徵 32% 關稅,但晶片暫無額外課稅
|
552 |
+
|
553 |
+
6. **全球視角**
|
554 |
+
- 歐盟與日本關稅比例相對較高
|
555 |
+
```
|
556 |
+
|
557 |
+
|
558 |
+
## Citation
|
559 |
+
|
560 |
+
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
561 |
+
```yaml
|
562 |
+
@misc{twinkleai2025llama3.2f1,
|
563 |
+
title = {Llama-3.2-3B-F1-Reasoning-Instruct: A Traditional Chinese Instruction-Tuned Reasoning Language Model for Taiwan},
|
564 |
+
author = {Huang, Liang Hsun and Chen, Min Yi and Lin, Wen Bin and Chuang, Chao Chun and Sung, Dave},
|
565 |
+
year = {2025},
|
566 |
+
howpublished = {\url{https://huggingface.co/twinkle-ai/Llama-3.2-3B-F1-Instruct}},
|
567 |
+
note = {Twinkle AI and APMIC. All authors contributed equally.}
|
568 |
+
}
|
569 |
+
```
|
570 |
+
|
571 |
+
## Acknowledge
|
572 |
+
- 特此感謝[國家高速網路與計算中心](https://www.nchc.org.tw/)的指導與 [APMIC](https://www.apmic.ai/) 的算力支援,才得以讓本專案訓利完成。
|
573 |
+
- 特此致謝黃啟聖老師、許武龍(哈爸)、臺北市立第一女子高級中學物理科陳姿燁老師、[奈視科技](https://nanoseex.com/) CTO Howard、[AIPLUX Technology](https://aiplux.com/)、郭家嘉老師以及所有在資料集製作過程中提供寶貴協助的夥伴。
|
574 |
+
|
575 |
+
## Model Card Authors
|
576 |
+
|
577 |
+
[Twinkle AI](https://huggingface.co/twinkle-ai)
|
578 |
+
|
579 |
+
## Model Card Contact
|
580 |
+
|
581 |
+
[Twinkle AI](https://huggingface.co/twinkle-ai)
|