danielhanchen commited on
Commit
61a973b
·
verified ·
1 Parent(s): 50bf791

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +218 -0
README.md ADDED
@@ -0,0 +1,218 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ license_link: https://huggingface.co/rednote-hilab/dots.llm1.inst/blob/main/LICENSE
4
+ pipeline_tag: text-generation
5
+ base_model:
6
+ - rednote-hilab/dots.llm1.inst
7
+ tags:
8
+ - chat
9
+ - unsloth
10
+ library_name: transformers
11
+ language:
12
+ - en
13
+ - zh
14
+ ---
15
+ <div>
16
+ <p style="margin-top: 0;margin-bottom: 0;">
17
+ <em><a href="https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf">Unsloth Dynamic 2.0</a> achieves superior accuracy & outperforms other leading quants.</em>
18
+ </p>
19
+ <div style="display: flex; gap: 5px; align-items: center; ">
20
+ <a href="https://github.com/unslothai/unsloth/">
21
+ <img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="133">
22
+ </a>
23
+ <a href="https://discord.gg/unsloth">
24
+ <img src="https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png" width="173">
25
+ </a>
26
+ <a href="https://docs.unsloth.ai/basics/qwen3-how-to-run-and-fine-tune">
27
+ <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
28
+ </a>
29
+ </div>
30
+ </div>
31
+
32
+
33
+ # dots1
34
+
35
+ <p align="center">
36
+ <img src="figures/new_logo2.png" width="300"/>
37
+ <p>
38
+
39
+ <p align="center">
40
+ &nbsp&nbsp🤗 <a href="https://huggingface.co/rednote-hilab">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://www.arxiv.org/abs/2506.05767">Paper</a> &nbsp&nbsp
41
+ <br>
42
+ 🖥️ <a href="https://huggingface.co/spaces/rednote-hilab/dots-demo">Demo</a>&nbsp&nbsp | &nbsp&nbsp💬 <a href="figures/wechat.png">WeChat (微信)</a>&nbsp&nbsp | &nbsp&nbsp📕 <a href="https://www.xiaohongshu.com/user/profile/683ffe42000000001d021a4c">rednote</a>&nbsp&nbsp
43
+ </p>
44
+
45
+
46
+ Visit our Hugging Face (click links above), search checkpoints with names starting with `dots.llm1` or visit the [dots1 collection](https://huggingface.co/collections/rednote-hilab/dotsllm1-68246aaaaba3363374a8aa7c), and you will find all you need! Enjoy!
47
+
48
+
49
+ ## News
50
+
51
+ - 2025.06.06: We released the `dots.llm1` series. Check our [report](https://github.com/rednote-hilab/dots.llm1/blob/main/dots1_tech_report.pdf) for more details!
52
+
53
+
54
+ ## 1. Introduction
55
+
56
+
57
+ The `dots.llm1` model is a large-scale MoE model that activates 14B parameters out of a total of 142B parameters, delivering performance on par with state-of-the-art models.
58
+ Leveraging our meticulously crafted and efficient data processing pipeline, `dots.llm1` achieves performance comparable to Qwen2.5-72B after pretrained on high-quality corpus without synthetic data. To foster further research, we open-source intermediate training checkpoints spanning the entire training process, providing valuable insights into the learning dynamics of large language models.
59
+
60
+
61
+ <p align="center">
62
+ <img width="90%" src="./figures/performance.png">
63
+ </p>
64
+
65
+ ## 2. Model Summary
66
+
67
+ **This repo contains the base and instruction-tuned `dots.llm1` model**. which has the following features:
68
+
69
+ - Type: A MoE model with 14B activated and 142B total parameters trained on high-quality corpus.
70
+ - Training Stages: Pretraining and SFT.
71
+ - Architecture: Multi-head Attention with QK-Norm in attention Layer, fine-grained MoE utilizing top-6 out of 128 routed experts, plus 2 shared experts.
72
+ - Number of Layers: 62
73
+ - Number of Attention Heads: 32
74
+ - Supported Languages: English, Chinese
75
+ - Context Length: 32,768 tokens
76
+ - License: MIT
77
+
78
+ The highlights from `dots.llm1` include:
79
+
80
+ - **Enhanced Data Processing**: We propose a scalable and fine-grained *three-stage* data processing framework designed to generate large-scale, high-quality and diverse data for pretraining.
81
+ - **No Synthetic Data during Pretraining**: High-quality non-synthetic tokens was used in base model pretraining.
82
+ - **Performance and Cost Efficiency**: `dots.llm1` is an open-source model that activates only *14B* parameters at inference, delivering both comprehensive capabilities and high computational efficiency.
83
+ - **Infrastructure**: We introduce an innovative MoE all-to-all communication and computation overlapping recipe based on interleaved 1F1B pipeline scheduling and an efficient grouped GEMM implementation to boost computational efficiency.
84
+ - **Open Accessibility to Model Dynamics**: Intermediate model checkpoints are released spanning the entire training process, facilitating future research into the learning dynamics of large language models.
85
+
86
+ ## 3. Example Usage
87
+
88
+ ### Model Downloads
89
+
90
+ <div align="center">
91
+
92
+ | **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download Link** |
93
+ | :------------: | :------------: | :------------: | :------------: | :------------: |
94
+ | dots.llm1.base | 142B | 14B | 32K | [🤗 Hugging Face](https://huggingface.co/rednote-hilab/dots.llm1.base) |
95
+ | dots.llm1.inst | 142B | 14B | 32K | [🤗 Hugging Face](https://huggingface.co/rednote-hilab/dots.llm1.inst) |
96
+
97
+ </div>
98
+
99
+ ### Docker (recommended)
100
+
101
+
102
+ The docker images are available on [Docker Hub](https://hub.docker.com/repository/docker/rednotehilab/dots1/tags), based on the official images.
103
+
104
+ You can start a server via vllm.
105
+
106
+ ```shell
107
+ docker run --gpus all \
108
+ -v ~/.cache/huggingface:/root/.cache/huggingface \
109
+ -p 8000:8000 \
110
+ --ipc=host \
111
+ rednotehilab/dots1:vllm-openai-v0.9.0.1 \
112
+ --model rednote-hilab/dots.llm1.inst \
113
+ --tensor-parallel-size 8 \
114
+ --trust-remote-code \
115
+ --served-model-name dots1
116
+ ```
117
+
118
+ Then you can verify whether the model is running successfully in the following way.
119
+
120
+ ```shell
121
+ curl http://localhost:8000/v1/chat/completions \
122
+ -H "Content-Type: application/json" \
123
+ -d '{
124
+ "model": "dots1",
125
+ "messages": [
126
+ {"role": "system", "content": "You are a helpful assistant."},
127
+ {"role": "user", "content": "Who won the world series in 2020?"}
128
+ ],
129
+ "max_tokens": 32,
130
+ "temperature": 0
131
+ }'
132
+ ```
133
+
134
+
135
+ ### Inference with huggingface
136
+
137
+ We are working to merge it into Transformers ([PR #38143](https://github.com/huggingface/transformers/pull/38143)).
138
+
139
+ #### Text Completion
140
+
141
+ ```python
142
+ import torch
143
+ from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
144
+
145
+ model_name = "rednote-hilab/dots.llm1.base"
146
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
147
+
148
+ model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.bfloat16)
149
+
150
+ text = "An attention function can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. The output is"
151
+ inputs = tokenizer(text, return_tensors="pt")
152
+ outputs = model.generate(**inputs.to(model.device), max_new_tokens=100)
153
+ result = tokenizer.decode(outputs[0], skip_special_tokens=True)
154
+ print(result)
155
+ ```
156
+
157
+ #### Chat Completion
158
+
159
+ ```python
160
+ import torch
161
+ from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
162
+
163
+ model_name = "rednote-hilab/dots.llm1.inst"
164
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
165
+
166
+ model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.bfloat16)
167
+
168
+ messages = [
169
+ {"role": "user", "content": "Write a piece of quicksort code in C++"}
170
+ ]
171
+ input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
172
+ outputs = model.generate(input_tensor.to(model.device), max_new_tokens=200)
173
+
174
+ result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)
175
+ print(result)
176
+ ```
177
+
178
+ ### Inference with vllm
179
+
180
+ [vLLM](https://github.com/vllm-project/vllm) is a high-throughput and memory-efficient inference and serving engine for LLMs. Official support for this feature is covered in [PR #18254](https://github.com/vllm-project/vllm/pull/18254).
181
+
182
+ ```shell
183
+ vllm serve dots.llm1.inst --port 8000 --tensor-parallel-size 8
184
+ ```
185
+
186
+ An OpenAI-compatible API will be available at `http://localhost:8000/v1`.
187
+
188
+ ### Inference with sglang
189
+
190
+ [SGLang](https://github.com/sgl-project/sglang) is a fast serving framework for large language models and vision language models. SGLang could be used to launch a server with OpenAI-compatible API service. Official support for this feature is covered in [PR #6471](https://github.com/sgl-project/sglang/pull/6471).
191
+
192
+ Getting started is as simple as running:
193
+
194
+ ```shell
195
+ python -m sglang.launch_server --model-path dots.llm1.inst --tp 8 --host 0.0.0.0 --port 8000
196
+ ```
197
+
198
+ An OpenAI-compatible API will be available at `http://localhost:8000/v1`.
199
+
200
+ ## 4. Evaluation Results
201
+
202
+ Detailed evaluation results are reported in this [📑 report](https://github.com/rednote-hilab/dots.llm1/blob/main/dots1_tech_report.pdf).
203
+
204
+ ## Citation
205
+
206
+ If you find `dots.llm1` is useful or want to use in your projects, please kindly cite our paper:
207
+
208
+ ```
209
+ @misc{huo2025dotsllm1technicalreport,
210
+ title={dots.llm1 Technical Report},
211
+ author={Bi Huo and Bin Tu and Cheng Qin and Da Zheng and Debing Zhang and Dongjie Zhang and En Li and Fu Guo and Jian Yao and Jie Lou and Junfeng Tian and Li Hu and Ran Zhu and Shengdong Chen and Shuo Liu and Su Guang and Te Wo and Weijun Zhang and Xiaoming Shi and Xinxin Peng and Xing Wu and Yawen Liu and Yuqiu Ji and Ze Wen and Zhenhai Liu and Zichao Li and Zilong Liao},
212
+ year={2025},
213
+ eprint={2506.05767},
214
+ archivePrefix={arXiv},
215
+ primaryClass={cs.CL},
216
+ url={https://arxiv.org/abs/2506.05767},
217
+ }
218
+ ```