jerryzh168 commited on
Commit
d682ce6
·
verified ·
1 Parent(s): 36880bf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -95,37 +95,37 @@ lm_eval --model hf --model_args pretrained=jerryzh168/phi4-mini-float8dq --tasks
95
 
96
  # Model Performance
97
 
98
- # Download vllm source code and install vllm
99
  ```
100
  git clone [email protected]:vllm-project/vllm.git
101
  VLLM_USE_PRECOMPILED=1 pip install .
102
  ```
103
 
104
- # Download dataset
105
  Download sharegpt dataset: `wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json`
106
 
107
  Other datasets can be found in: https://github.com/vllm-project/vllm/tree/main/benchmarks
108
- # benchmark_latency
109
 
110
  Run the following under `vllm` source code root folder:
111
 
112
- ## baseline
113
  ```
114
  python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model microsoft/Phi-4-mini-instruct --batch-size 1
115
  ```
116
 
117
- ## float8dq
118
  ```
119
  python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model jerryzh168/phi4-mini-float8dq --batch-size 1
120
  ```
121
 
122
- # benchmark_serving
123
 
124
  We also benchmarked the throughput in a serving environment.
125
 
126
  Run the following under `vllm` source code root folder:
127
 
128
- ## baseline
129
  Server:
130
  ```
131
  vllm serve microsoft/Phi-4-mini-instruct --tokenizer microsoft/Phi-4-mini-instruct -O3
@@ -136,7 +136,7 @@ Client:
136
  python benchmarks/benchmark_serving.py --backend vllm --dataset-name sharegpt --tokenizer microsoft/Phi-4-mini-instruct --dataset-path ./ShareGPT_V3_unfiltered_cleaned_split.json --model microsoft/Phi-4-mini-instruct --num-prompts 1
137
  ```
138
 
139
- ## float8dq
140
  Server:
141
  ```
142
  vllm serve jerryzh168/phi4-mini-float8dq --tokenizer microsoft/Phi-4-mini-instruct -O3
 
95
 
96
  # Model Performance
97
 
98
+ ## Download vllm source code and install vllm
99
  ```
100
  git clone [email protected]:vllm-project/vllm.git
101
  VLLM_USE_PRECOMPILED=1 pip install .
102
  ```
103
 
104
+ ## Download dataset
105
  Download sharegpt dataset: `wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json`
106
 
107
  Other datasets can be found in: https://github.com/vllm-project/vllm/tree/main/benchmarks
108
+ ## benchmark_latency
109
 
110
  Run the following under `vllm` source code root folder:
111
 
112
+ ### baseline
113
  ```
114
  python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model microsoft/Phi-4-mini-instruct --batch-size 1
115
  ```
116
 
117
+ ### float8dq
118
  ```
119
  python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model jerryzh168/phi4-mini-float8dq --batch-size 1
120
  ```
121
 
122
+ ## benchmark_serving
123
 
124
  We also benchmarked the throughput in a serving environment.
125
 
126
  Run the following under `vllm` source code root folder:
127
 
128
+ ### baseline
129
  Server:
130
  ```
131
  vllm serve microsoft/Phi-4-mini-instruct --tokenizer microsoft/Phi-4-mini-instruct -O3
 
136
  python benchmarks/benchmark_serving.py --backend vllm --dataset-name sharegpt --tokenizer microsoft/Phi-4-mini-instruct --dataset-path ./ShareGPT_V3_unfiltered_cleaned_split.json --model microsoft/Phi-4-mini-instruct --num-prompts 1
137
  ```
138
 
139
+ ### float8dq
140
  Server:
141
  ```
142
  vllm serve jerryzh168/phi4-mini-float8dq --tokenizer microsoft/Phi-4-mini-instruct -O3