Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
base_model:
|
4 |
+
- Qwen/Qwen3-8B
|
5 |
+
---
|
6 |
+
|
7 |
+
## Introduce
|
8 |
+
We adapted the official speculative sampling training method, Eagle3, for training on Qwen3-8B.
|
9 |
+
|
10 |
+
After implementing Eagle3, the inference performance of Qwen3-8B using the SGLang framework on a single H200 GPU improved from 187 tokens/s to 365 tokens/s.
|
11 |
+
|
12 |
+
The TPS (tokens per second) improvement reached nearly 100%.
|
13 |
+
|
14 |
+
Amazingly, on a single RTX 5090, the TPS (transactions per second) of Qwen3-8B-Eagle3 increased from 90 to 220.
|
15 |
+
|
16 |
+
The TPS (tokens per second) improvement reached nearly 140%.
|
17 |
+
|
18 |
+
| model | gpu | tps |
|
19 |
+
|---------|---------|---------|
|
20 |
+
| qwen3-8b | 5090 | 90 |
|
21 |
+
| qwen3-8b-eagle3 | 5090 | 220 |
|
22 |
+
| qwen3-8b | h200 | 187 |
|
23 |
+
| qwen3-8b-eagle3 | h200 | 365 |
|
24 |
+
|
25 |
+
## How to use
|
26 |
+
|
27 |
+
|
28 |
+
The launch command for using Eagle3 with SGLang is:
|
29 |
+
|
30 |
+
```python
|
31 |
+
python3 -m sglang.launch_server --model Qwen/Qwen3-8B --speculative-algorithm EAGLE3 --speculative-draft-model-path Tengyunw/qwen3_8b_eagle3 --speculative-num-steps 6 --speculative-eagle-topk 10 --speculative-num-draft-tokens 32 --mem-fraction 0.9 --cuda-graph-max-bs 2 --dtype bfloat16
|
32 |
+
|
33 |
+
```
|
34 |
+
|
35 |
+
## How to train
|
36 |
+
|
37 |
+
Training Dataset:
|
38 |
+
ultrachat_200k.
|
39 |
+
Only the prompts from these datasets were utilized for data synthesis. This synthesized data is used to train the Eagle modules.
|
40 |
+
|
41 |
+
dataset nums: 600K samples,1B tokens
|
42 |
+
|
43 |
+
Evaluation Dataset:
|
44 |
+
ShareGPT,GSM8K,HUAMEVAL,MT-BENCH,APLCA
|
45 |
+
|
46 |
+
Our Sharegpt test data is located in the eagle_data.jsonl file under this directory.
|