Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -95,9 +95,10 @@ lm_eval --model hf --model_args pretrained=jerryzh168/phi4-mini-float8dq --tasks
 # Model Performance
-# Install latest vllm to get the most recent changes
 ```
-pip install git+https://github.com/vllm-project/vllm.git
 ```
 # Download dataset
@@ -105,6 +106,9 @@ Download sharegpt dataset: `wget https://huggingface.co/datasets/anon8231489123/
 Other datasets can be found in: https://github.com/vllm-project/vllm/tree/main/benchmarks
 # benchmark_latency
 ## baseline
 ```
 python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model microsoft/Phi-4-mini-instruct --batch-size 1
@@ -119,6 +123,8 @@ python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model
 We also benchmarked the throughput in a serving environment.
 ## baseline
 Server:
 ```

 # Model Performance
+# Download vllm source code and install vllm
 ```
+git clone git@github.com:vllm-project/vllm.git
+VLLM_USE_PRECOMPILED=1 pip install .
 ```
 # Download dataset
 Other datasets can be found in: https://github.com/vllm-project/vllm/tree/main/benchmarks
 # benchmark_latency
+Run the following under `vllm` source code root folder:
 ## baseline
 ```
 python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model microsoft/Phi-4-mini-instruct --batch-size 1
 We also benchmarked the throughput in a serving environment.
+Run the following under `vllm` source code root folder:
 ## baseline
 Server:
 ```