Improve model card: Add library, paper, GitHub links, and MoE tag

This PR improves the model card for **SmallThinker-21BA3B-Instruct** by:
- Adding `library_name: transformers` to the metadata, which enables the "Use in Transformers" widget on the model page.
- Adding the `moe` tag to the metadata for better discoverability, as this model is a Mixture-of-Experts.
- Including a direct link to the official Hugging Face paper page: [SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment](https://huggingface.co/papers/2507.20984).
- Adding a direct link to the main GitHub repository: [https://github.com/SJTU-IPADS/SmallThinker](https://github.com/SJTU-IPADS/SmallThinker).

These updates make the model more accessible and easier to understand for the community.

Files changed (1) hide show

README.md +22 -20

README.md CHANGED Viewed

@@ -1,14 +1,18 @@
 ---
-license: apache-2.0
 language:
 - en
 pipeline_tag: text-generation
 ---
 ## Introduction
 <p align="center">
-       &nbsp&nbsp🤗 <a href="https://huggingface.co/PowerInfer">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/PowerInfer">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://github.com/SJTU-IPADS/SmallThinker/blob/main/smallthinker-technical-report.pdf">Technical Report</a> &nbsp&nbsp
 </p>
 SmallThinker is a family of **on-device native** Mixture-of-Experts (MoE) language models specially designed for local deployment,
@@ -17,34 +21,32 @@ Designed from the ground up for resource-constrained environments,
 SmallThinker brings powerful, private, and low-latency AI directly to your personal devices,
 without relying on the cloud.
 ## Performance
 Note: The model is trained mainly on English.
-| Model                        | MMLU  | GPQA-diamond | MATH-500 | IFEVAL | LIVEBENCH | HUMANEVAL | Average |
-|------------------------------|-------|--------------|----------|--------|-----------|-----------|---------|
-| **SmallThinker-21BA3B-Instruct** | 84.43 | <u>55.05</u> | 82.4     | **85.77** | **60.3**      | <u>89.63</u>     | **76.26**   |
-| Gemma3-12b-it                | 78.52 | 34.85        | 82.4     | 74.68  | 44.5      | 82.93     | 66.31   |
-| Qwen3-14B                    | <u>84.82</u> | 50 | **84.6** | <u>85.21</u>| <u>59.5</u> | 88.41     | <u>75.42</u>   |
-| Qwen3-30BA3B                 | **85.1**  | 44.4     | <u>84.4</u> | 84.29  | 58.8      | **90.24**     | 74.54   |
-| Qwen3-8B                     | 81.79 | 38.89        | 81.6     | 83.92  | 49.5      | 85.9      | 70.26   |
-| Phi-4-14B                    | 84.58 | **55.45**    | 80.2     | 63.22  | 42.4      | 87.2      | 68.84   |
 For the MMLU evaluation, we use a 0-shot CoT setting.
 All models are evaluated in non-thinking mode.
 ## Speed
-| Model                               | Memory(GiB)         | i9 14900 | 1+13 8ge4 | rk3588 (16G) | Raspberry PI 5 |
-|--------------------------------------|---------------------|----------|-----------|--------------|----------------|
-| SmallThinker 21B+sparse              | 11.47               | 30.19    | 23.03     | 10.84        | 6.61           |
-| SmallThinker 21B+sparse+limited memory | limit 8G         | 20.30    | 15.50     | 8.56         | -              |
-| Qwen3 30B A3B                        | 16.20               | 33.52    | 20.18     | 9.07         | -              |
-| Qwen3 30B A3B+limited memory          | limit 8G            | 10.11    | 0.18      | 6.32         | -              |
-| Gemma 3n E2B                         | 1G, theoretically   | 36.88    | 27.06     | 12.50        | 6.66           |
-| Gemma 3n E4B                         | 2G, theoretically   | 21.93    | 16.58     | 7.37         | 4.01           |
 Note: i9 14900, 1+13 8ge4 use 4 threads, others use the number of threads that can achieve the maximum speed. All models here have been quantized to q4_0.
 You can deploy SmallThinker with offloading support using [PowerInfer](https://github.com/SJTU-IPADS/PowerInfer/tree/main/smallthinker)

 ---
 language:
 - en
+license: apache-2.0
 pipeline_tag: text-generation
+library_name: transformers
+tags:
+- moe
 ---
 ## Introduction
 <p align="center">
+       &nbsp&nbsp🤗 <a href="https://huggingface.co/PowerInfer">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/PowerInfer">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://github.com/SJTU-IPADS/SmallThinker/blob/main/smallthinker-technical-report.pdf">Technical Report</a> &nbsp&nbsp
+       &nbsp&nbsp 📚 <a href="https://huggingface.co/papers/2507.20984">Paper</a> &nbsp&nbsp | &nbsp&nbsp 💻 <a href="https://github.com/SJTU-IPADS/SmallThinker">GitHub Repo</a> &nbsp&nbsp
 </p>
 SmallThinker is a family of **on-device native** Mixture-of-Experts (MoE) language models specially designed for local deployment,
 SmallThinker brings powerful, private, and low-latency AI directly to your personal devices,
 without relying on the cloud.
 ## Performance
 Note: The model is trained mainly on English.
+| Model | MMLU | GPQA-diamond | MATH-500 | IFEVAL | LIVEBENCH | HUMANEVAL | Average |
+|---|---|---|---|---|---|---|---|
+| **SmallThinker-21BA3B-Instruct** | 84.43 | <u>55.05</u> | 82.4 | **85.77** | **60.3** | <u>89.63</u> | **76.26** |
+| Gemma3-12b-it | 78.52 | 34.85 | 82.4 | 74.68 | 44.5 | 82.93 | 66.31 |
+| Qwen3-14B | <u>84.82</u> | 50 | **84.6** | <u>85.21</u>| <u>59.5</u> | 88.41 | <u>75.42</u> |
+| Qwen3-30BA3B | **85.1** | 44.4 | <u>84.4</u> | 84.29 | 58.8 | **90.24** | 74.54 |
+| Qwen3-8B | 81.79 | 38.89 | 81.6 | 83.92 | 49.5 | 85.9 | 70.26 |
+| Phi-4-14B | 84.58 | **55.45** | 80.2 | 63.22 | 42.4 | 87.2 | 68.84 |
 For the MMLU evaluation, we use a 0-shot CoT setting.
 All models are evaluated in non-thinking mode.
 ## Speed
+| Model | Memory(GiB) | i9 14900 | 1+13 8ge4 | rk3588 (16G) | Raspberry PI 5 |
+|---|---|---|---|---|---|
+| SmallThinker 21B+sparse | 11.47 | 30.19 | 23.03 | 10.84 | 6.61 |
+| SmallThinker 21B+sparse+limited memory | limit 8G | 20.30 | 15.50 | 8.56 | - |
+| Qwen3 30B A3B | 16.20 | 33.52 | 20.18 | 9.07 | - |
+| Qwen3 30B A3B+limited memory | limit 8G | 10.11 | 0.18 | 6.32 | - |
+| Gemma 3n E2B | 1G, theoretically | 36.88 | 27.06 | 12.50 | 6.66 |
+| Gemma 3n E4B | 2G, theoretically | 21.93 | 16.58 | 7.37 | 4.01 |
 Note: i9 14900, 1+13 8ge4 use 4 threads, others use the number of threads that can achieve the maximum speed. All models here have been quantized to q4_0.
 You can deploy SmallThinker with offloading support using [PowerInfer](https://github.com/SJTU-IPADS/PowerInfer/tree/main/smallthinker)