nielsr HF Staff commited on
Commit
359ea43
·
verified ·
1 Parent(s): b5bd753

Improve model card: Add library, paper, GitHub links, and MoE tag

Browse files

This PR improves the model card for **SmallThinker-21BA3B-Instruct** by:
- Adding `library_name: transformers` to the metadata, which enables the "Use in Transformers" widget on the model page.
- Adding the `moe` tag to the metadata for better discoverability, as this model is a Mixture-of-Experts.
- Including a direct link to the official Hugging Face paper page: [SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment](https://huggingface.co/papers/2507.20984).
- Adding a direct link to the main GitHub repository: [https://github.com/SJTU-IPADS/SmallThinker](https://github.com/SJTU-IPADS/SmallThinker).

These updates make the model more accessible and easier to understand for the community.

Files changed (1) hide show
  1. README.md +22 -20
README.md CHANGED
@@ -1,14 +1,18 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
  - en
 
5
  pipeline_tag: text-generation
 
 
 
6
  ---
7
 
8
  ## Introduction
9
 
10
  <p align="center">
11
- &nbsp&nbsp🤗 <a href="https://huggingface.co/PowerInfer">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/PowerInfer">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://github.com/SJTU-IPADS/SmallThinker/blob/main/smallthinker-technical-report.pdf">Technical Report</a> &nbsp&nbsp
 
12
  </p>
13
 
14
  SmallThinker is a family of **on-device native** Mixture-of-Experts (MoE) language models specially designed for local deployment,
@@ -17,34 +21,32 @@ Designed from the ground up for resource-constrained environments,
17
  SmallThinker brings powerful, private, and low-latency AI directly to your personal devices,
18
  without relying on the cloud.
19
 
20
-
21
-
22
  ## Performance
23
 
24
  Note: The model is trained mainly on English.
25
 
26
- | Model | MMLU | GPQA-diamond | MATH-500 | IFEVAL | LIVEBENCH | HUMANEVAL | Average |
27
- |------------------------------|-------|--------------|----------|--------|-----------|-----------|---------|
28
- | **SmallThinker-21BA3B-Instruct** | 84.43 | <u>55.05</u> | 82.4 | **85.77** | **60.3** | <u>89.63</u> | **76.26** |
29
- | Gemma3-12b-it | 78.52 | 34.85 | 82.4 | 74.68 | 44.5 | 82.93 | 66.31 |
30
- | Qwen3-14B | <u>84.82</u> | 50 | **84.6** | <u>85.21</u>| <u>59.5</u> | 88.41 | <u>75.42</u> |
31
- | Qwen3-30BA3B | **85.1** | 44.4 | <u>84.4</u> | 84.29 | 58.8 | **90.24** | 74.54 |
32
- | Qwen3-8B | 81.79 | 38.89 | 81.6 | 83.92 | 49.5 | 85.9 | 70.26 |
33
- | Phi-4-14B | 84.58 | **55.45** | 80.2 | 63.22 | 42.4 | 87.2 | 68.84 |
34
 
35
  For the MMLU evaluation, we use a 0-shot CoT setting.
36
 
37
  All models are evaluated in non-thinking mode.
38
 
39
  ## Speed
40
- | Model | Memory(GiB) | i9 14900 | 1+13 8ge4 | rk3588 (16G) | Raspberry PI 5 |
41
- |--------------------------------------|---------------------|----------|-----------|--------------|----------------|
42
- | SmallThinker 21B+sparse | 11.47 | 30.19 | 23.03 | 10.84 | 6.61 |
43
- | SmallThinker 21B+sparse+limited memory | limit 8G | 20.30 | 15.50 | 8.56 | - |
44
- | Qwen3 30B A3B | 16.20 | 33.52 | 20.18 | 9.07 | - |
45
- | Qwen3 30B A3B+limited memory | limit 8G | 10.11 | 0.18 | 6.32 | - |
46
- | Gemma 3n E2B | 1G, theoretically | 36.88 | 27.06 | 12.50 | 6.66 |
47
- | Gemma 3n E4B | 2G, theoretically | 21.93 | 16.58 | 7.37 | 4.01 |
48
 
49
  Note: i9 14900, 1+13 8ge4 use 4 threads, others use the number of threads that can achieve the maximum speed. All models here have been quantized to q4_0.
50
  You can deploy SmallThinker with offloading support using [PowerInfer](https://github.com/SJTU-IPADS/PowerInfer/tree/main/smallthinker)
 
1
  ---
 
2
  language:
3
  - en
4
+ license: apache-2.0
5
  pipeline_tag: text-generation
6
+ library_name: transformers
7
+ tags:
8
+ - moe
9
  ---
10
 
11
  ## Introduction
12
 
13
  <p align="center">
14
+ &nbsp&nbsp🤗 <a href="https://huggingface.co/PowerInfer">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/PowerInfer">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://github.com/SJTU-IPADS/SmallThinker/blob/main/smallthinker-technical-report.pdf">Technical Report</a> &nbsp&nbsp
15
+ &nbsp&nbsp 📚 <a href="https://huggingface.co/papers/2507.20984">Paper</a> &nbsp&nbsp | &nbsp&nbsp 💻 <a href="https://github.com/SJTU-IPADS/SmallThinker">GitHub Repo</a> &nbsp&nbsp
16
  </p>
17
 
18
  SmallThinker is a family of **on-device native** Mixture-of-Experts (MoE) language models specially designed for local deployment,
 
21
  SmallThinker brings powerful, private, and low-latency AI directly to your personal devices,
22
  without relying on the cloud.
23
 
 
 
24
  ## Performance
25
 
26
  Note: The model is trained mainly on English.
27
 
28
+ | Model | MMLU | GPQA-diamond | MATH-500 | IFEVAL | LIVEBENCH | HUMANEVAL | Average |
29
+ |---|---|---|---|---|---|---|---|
30
+ | **SmallThinker-21BA3B-Instruct** | 84.43 | <u>55.05</u> | 82.4 | **85.77** | **60.3** | <u>89.63</u> | **76.26** |
31
+ | Gemma3-12b-it | 78.52 | 34.85 | 82.4 | 74.68 | 44.5 | 82.93 | 66.31 |
32
+ | Qwen3-14B | <u>84.82</u> | 50 | **84.6** | <u>85.21</u>| <u>59.5</u> | 88.41 | <u>75.42</u> |
33
+ | Qwen3-30BA3B | **85.1** | 44.4 | <u>84.4</u> | 84.29 | 58.8 | **90.24** | 74.54 |
34
+ | Qwen3-8B | 81.79 | 38.89 | 81.6 | 83.92 | 49.5 | 85.9 | 70.26 |
35
+ | Phi-4-14B | 84.58 | **55.45** | 80.2 | 63.22 | 42.4 | 87.2 | 68.84 |
36
 
37
  For the MMLU evaluation, we use a 0-shot CoT setting.
38
 
39
  All models are evaluated in non-thinking mode.
40
 
41
  ## Speed
42
+ | Model | Memory(GiB) | i9 14900 | 1+13 8ge4 | rk3588 (16G) | Raspberry PI 5 |
43
+ |---|---|---|---|---|---|
44
+ | SmallThinker 21B+sparse | 11.47 | 30.19 | 23.03 | 10.84 | 6.61 |
45
+ | SmallThinker 21B+sparse+limited memory | limit 8G | 20.30 | 15.50 | 8.56 | - |
46
+ | Qwen3 30B A3B | 16.20 | 33.52 | 20.18 | 9.07 | - |
47
+ | Qwen3 30B A3B+limited memory | limit 8G | 10.11 | 0.18 | 6.32 | - |
48
+ | Gemma 3n E2B | 1G, theoretically | 36.88 | 27.06 | 12.50 | 6.66 |
49
+ | Gemma 3n E4B | 2G, theoretically | 21.93 | 16.58 | 7.37 | 4.01 |
50
 
51
  Note: i9 14900, 1+13 8ge4 use 4 threads, others use the number of threads that can achieve the maximum speed. All models here have been quantized to q4_0.
52
  You can deploy SmallThinker with offloading support using [PowerInfer](https://github.com/SJTU-IPADS/PowerInfer/tree/main/smallthinker)