jerryzh168 commited on
Commit
3d272da
·
verified ·
1 Parent(s): c6246f9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -12
README.md CHANGED
@@ -270,8 +270,7 @@ print(f"Peak Memory Usage: {mem:.02f} GB")
270
 
271
  Note the result of latency (benchmark_latency) is in seconds, and serving (benchmark_serving) is in number of requests per second.
272
 
273
- ## benchmark_latency
274
-
275
  Need to install vllm nightly to get some recent changes
276
  ```Shell
277
  pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
@@ -282,7 +281,9 @@ Get vllm source code:
282
  git clone [email protected]:vllm-project/vllm.git
283
  ```
284
 
285
- Run the following under `vllm` root folder:
 
 
286
 
287
  ### baseline
288
  ```Shell
@@ -296,18 +297,10 @@ python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model
296
 
297
  ## benchmark_serving
298
 
299
- We also benchmarked the throughput in a serving environment.
300
 
301
  Download sharegpt dataset: `wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json`
302
  Other datasets can be found in: https://github.com/vllm-project/vllm/tree/main/benchmarks
303
-
304
- Get vllm source code:
305
- ```Shell
306
- git clone [email protected]:vllm-project/vllm.git
307
- ```
308
-
309
- Run the following under `vllm` root folder:
310
-
311
  ### baseline
312
  Server:
313
  ```Shell
 
270
 
271
  Note the result of latency (benchmark_latency) is in seconds, and serving (benchmark_serving) is in number of requests per second.
272
 
273
+ ## Setup
 
274
  Need to install vllm nightly to get some recent changes
275
  ```Shell
276
  pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
 
281
  git clone [email protected]:vllm-project/vllm.git
282
  ```
283
 
284
+ Run the benchmarks under `vllm` root folder:
285
+
286
+ ## benchmark_latency
287
 
288
  ### baseline
289
  ```Shell
 
297
 
298
  ## benchmark_serving
299
 
300
+ We benchmarked the throughput in a serving environment.
301
 
302
  Download sharegpt dataset: `wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json`
303
  Other datasets can be found in: https://github.com/vllm-project/vllm/tree/main/benchmarks
 
 
 
 
 
 
 
 
304
  ### baseline
305
  Server:
306
  ```Shell