jerryzh168 commited on
Commit
ae4c6ae
·
verified ·
1 Parent(s): 69fb0e9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -102,11 +102,13 @@ lm_eval --model hf --model_args pretrained=jerryzh168/phi4-mini-float8dq --tasks
102
  | Benchmark | | |
103
  |----------------------------------|----------------|--------------------------|
104
  | | Phi-4 mini-Ins | phi4-mini-float8dq |
105
- | latency (batch_size=1) | 1.64 s | 1.41s (16% speedup) |
106
- | latency (batch_size=128) | 3.1 s | 2.72s (14% speedup) |
107
  | serving (num_prompts=1) | 1.35 req/s | 1.57 req/s (16% speedup) |
108
  | serving (num_prompts=1000) | 66.68 req/s | 80.53 req/s (21% speedup)|
109
 
 
 
110
  ## Download vllm source code and install vllm
111
  ```
112
  git clone [email protected]:vllm-project/vllm.git
 
102
  | Benchmark | | |
103
  |----------------------------------|----------------|--------------------------|
104
  | | Phi-4 mini-Ins | phi4-mini-float8dq |
105
+ | latency (batch_size=1) | 1.64s | 1.41s (16% speedup) |
106
+ | latency (batch_size=128) | 3.1s | 2.72s (14% speedup) |
107
  | serving (num_prompts=1) | 1.35 req/s | 1.57 req/s (16% speedup) |
108
  | serving (num_prompts=1000) | 66.68 req/s | 80.53 req/s (21% speedup)|
109
 
110
+ Note the result of latency (benchmark_latency) is in seconds, and serving (benchmark_serving) is in number of requests per second.
111
+
112
  ## Download vllm source code and install vllm
113
  ```
114
  git clone [email protected]:vllm-project/vllm.git