liushaowei commited on
Commit
d413755
·
1 Parent(s): bb39c64

update readme format

Browse files
Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -9,9 +9,9 @@ library_name: transformers
9
  <!-- # Muon is Scalable For LLM Training -->
10
 
11
  <div align="center">
12
- <a href="https://github.com/MoonshotAI/dummy.pdf"><img src="figures/logo.png" height="16" width="16" style="vertical-align:middle"><b> Tech Report</b></a> |
13
- <a href="https://huggingface.co/moonshotai/Moonlight"><img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" height="16" width="16" style="vertical-align:middle"><b> HuggingFace</b></a> |
14
- <a href="#"><img src="figures/megatron.png" height="16" width="16" style="vertical-align:middle"><b>Megatron(coming soon)</b></a>
15
  </div>
16
 
17
 
@@ -52,7 +52,7 @@ We compared Moonlight with SOTA public models at similar scale:
52
  - **LLAMA3-3B** is a 3B-parameter dense model trained with 9T tokens
53
  - **Qwen2.5-3B** is a 3B-parameter dense model trained with 18T tokens
54
  - **Deepseek-v2-Lite** is a 2.4B/16B-parameter MOE model trained with 5.7T tokens
55
-
56
  | | **Benchmark (Metric)** | **Llama3.2-3B** | **Qwen2.5-3B** | **DSV2-Lite** | **Moonlight** |
57
  |---|---|---|---|---|---|
58
  | | Activated Param† | 2.81B | 2.77B | 2.24B | 2.24B |
@@ -70,6 +70,7 @@ We compared Moonlight with SOTA public models at similar scale:
70
  | | CMath | - | 80.0 | 58.4 | **81.1** |
71
  | **Chinese** | C-Eval | - | 75.0 | 60.3 | **77.2** |
72
  | | CMMLU | - | 75.0 | 64.3 | **78.2** |
 
73
 
74
  *Qwen 2 & 2.5 reports didn't disclose their optimizer information. †The reported parameter counts exclude the embedding parameters. ‡We test all listed models with the full set of TriviaQA.*
75
 
 
9
  <!-- # Muon is Scalable For LLM Training -->
10
 
11
  <div align="center">
12
+ <a href="https://github.com/MoonshotAI/dummy.pdf" ><img src="figures/logo.png" height="16" width="16" style="display: inline-block; vertical-align: middle; margin: 2px;"><b style="display: inline-block;"> Tech Report</b></a> |
13
+ <a href="https://huggingface.co/moonshotai/Moonlight"><img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" height="16" width="16" style="display: inline-block; vertical-align: middle; margin: 2px;"><b style="display: inline-block;"> HuggingFace</b></a> |
14
+ <a href="#"><img src="figures/megatron.png" height="16" width="16" style="display: inline-block; vertical-align: middle; margin: 2px;"><b style="display: inline-block;">Megatron(coming soon)</b></a>
15
  </div>
16
 
17
 
 
52
  - **LLAMA3-3B** is a 3B-parameter dense model trained with 9T tokens
53
  - **Qwen2.5-3B** is a 3B-parameter dense model trained with 18T tokens
54
  - **Deepseek-v2-Lite** is a 2.4B/16B-parameter MOE model trained with 5.7T tokens
55
+ <div align="center">
56
  | | **Benchmark (Metric)** | **Llama3.2-3B** | **Qwen2.5-3B** | **DSV2-Lite** | **Moonlight** |
57
  |---|---|---|---|---|---|
58
  | | Activated Param† | 2.81B | 2.77B | 2.24B | 2.24B |
 
70
  | | CMath | - | 80.0 | 58.4 | **81.1** |
71
  | **Chinese** | C-Eval | - | 75.0 | 60.3 | **77.2** |
72
  | | CMMLU | - | 75.0 | 64.3 | **78.2** |
73
+ </div>
74
 
75
  *Qwen 2 & 2.5 reports didn't disclose their optimizer information. †The reported parameter counts exclude the embedding parameters. ‡We test all listed models with the full set of TriviaQA.*
76