Lil2J commited on
Commit
4671a5b
·
verified ·
1 Parent(s): 9184227

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -0
README.md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model:
4
+ - Qwen/Qwen3-8B
5
+ ---
6
+
7
+ ## Introduce
8
+ We adapted the official speculative sampling training method, Eagle3, for training on Qwen3-8B.
9
+
10
+ After implementing Eagle3, the inference performance of Qwen3-8B using the SGLang framework on a single H200 GPU improved from 187 tokens/s to 365 tokens/s.
11
+
12
+ The TPS (tokens per second) improvement reached nearly 100%.
13
+
14
+ Amazingly, on a single RTX 5090, the TPS (transactions per second) of Qwen3-8B-Eagle3 increased from 90 to 220.
15
+
16
+ The TPS (tokens per second) improvement reached nearly 140%.
17
+
18
+ | model | gpu | tps |
19
+ |---------|---------|---------|
20
+ | qwen3-8b | 5090 | 90 |
21
+ | qwen3-8b-eagle3 | 5090 | 220 |
22
+ | qwen3-8b | h200 | 187 |
23
+ | qwen3-8b-eagle3 | h200 | 365 |
24
+
25
+ ## How to use
26
+
27
+
28
+ The launch command for using Eagle3 with SGLang is:
29
+
30
+ ```python
31
+ python3 -m sglang.launch_server --model Qwen/Qwen3-8B --speculative-algorithm EAGLE3 --speculative-draft-model-path Tengyunw/qwen3_8b_eagle3 --speculative-num-steps 6 --speculative-eagle-topk 10 --speculative-num-draft-tokens 32 --mem-fraction 0.9 --cuda-graph-max-bs 2 --dtype bfloat16
32
+
33
+ ```
34
+
35
+ ## How to train
36
+
37
+ Training Dataset:
38
+ ultrachat_200k.
39
+ Only the prompts from these datasets were utilized for data synthesis. This synthesized data is used to train the Eagle modules.
40
+
41
+ dataset nums: 600K samples,1B tokens
42
+
43
+ Evaluation Dataset:
44
+ ShareGPT,GSM8K,HUAMEVAL,MT-BENCH,APLCA
45
+
46
+ Our Sharegpt test data is located in the eagle_data.jsonl file under this directory.