liushaowei
commited on
Commit
·
d413755
1
Parent(s):
bb39c64
update readme format
Browse files
README.md
CHANGED
@@ -9,9 +9,9 @@ library_name: transformers
|
|
9 |
<!-- # Muon is Scalable For LLM Training -->
|
10 |
|
11 |
<div align="center">
|
12 |
-
<a href="https://github.com/MoonshotAI/dummy.pdf"><img src="figures/logo.png" height="16" width="16" style="vertical-align:middle"><b> Tech Report</b></a> |
|
13 |
-
<a href="https://huggingface.co/moonshotai/Moonlight"><img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" height="16" width="16" style="vertical-align:middle"><b> HuggingFace</b></a> |
|
14 |
-
<a href="#"><img src="figures/megatron.png" height="16" width="16" style="vertical-align:middle"><b>Megatron(coming soon)</b></a>
|
15 |
</div>
|
16 |
|
17 |
|
@@ -52,7 +52,7 @@ We compared Moonlight with SOTA public models at similar scale:
|
|
52 |
- **LLAMA3-3B** is a 3B-parameter dense model trained with 9T tokens
|
53 |
- **Qwen2.5-3B** is a 3B-parameter dense model trained with 18T tokens
|
54 |
- **Deepseek-v2-Lite** is a 2.4B/16B-parameter MOE model trained with 5.7T tokens
|
55 |
-
|
56 |
| | **Benchmark (Metric)** | **Llama3.2-3B** | **Qwen2.5-3B** | **DSV2-Lite** | **Moonlight** |
|
57 |
|---|---|---|---|---|---|
|
58 |
| | Activated Param† | 2.81B | 2.77B | 2.24B | 2.24B |
|
@@ -70,6 +70,7 @@ We compared Moonlight with SOTA public models at similar scale:
|
|
70 |
| | CMath | - | 80.0 | 58.4 | **81.1** |
|
71 |
| **Chinese** | C-Eval | - | 75.0 | 60.3 | **77.2** |
|
72 |
| | CMMLU | - | 75.0 | 64.3 | **78.2** |
|
|
|
73 |
|
74 |
*Qwen 2 & 2.5 reports didn't disclose their optimizer information. †The reported parameter counts exclude the embedding parameters. ‡We test all listed models with the full set of TriviaQA.*
|
75 |
|
|
|
9 |
<!-- # Muon is Scalable For LLM Training -->
|
10 |
|
11 |
<div align="center">
|
12 |
+
<a href="https://github.com/MoonshotAI/dummy.pdf" ><img src="figures/logo.png" height="16" width="16" style="display: inline-block; vertical-align: middle; margin: 2px;"><b style="display: inline-block;"> Tech Report</b></a> |
|
13 |
+
<a href="https://huggingface.co/moonshotai/Moonlight"><img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" height="16" width="16" style="display: inline-block; vertical-align: middle; margin: 2px;"><b style="display: inline-block;"> HuggingFace</b></a> |
|
14 |
+
<a href="#"><img src="figures/megatron.png" height="16" width="16" style="display: inline-block; vertical-align: middle; margin: 2px;"><b style="display: inline-block;">Megatron(coming soon)</b></a>
|
15 |
</div>
|
16 |
|
17 |
|
|
|
52 |
- **LLAMA3-3B** is a 3B-parameter dense model trained with 9T tokens
|
53 |
- **Qwen2.5-3B** is a 3B-parameter dense model trained with 18T tokens
|
54 |
- **Deepseek-v2-Lite** is a 2.4B/16B-parameter MOE model trained with 5.7T tokens
|
55 |
+
<div align="center">
|
56 |
| | **Benchmark (Metric)** | **Llama3.2-3B** | **Qwen2.5-3B** | **DSV2-Lite** | **Moonlight** |
|
57 |
|---|---|---|---|---|---|
|
58 |
| | Activated Param† | 2.81B | 2.77B | 2.24B | 2.24B |
|
|
|
70 |
| | CMath | - | 80.0 | 58.4 | **81.1** |
|
71 |
| **Chinese** | C-Eval | - | 75.0 | 60.3 | **77.2** |
|
72 |
| | CMMLU | - | 75.0 | 64.3 | **78.2** |
|
73 |
+
</div>
|
74 |
|
75 |
*Qwen 2 & 2.5 reports didn't disclose their optimizer information. †The reported parameter counts exclude the embedding parameters. ‡We test all listed models with the full set of TriviaQA.*
|
76 |
|