jerryzh168 commited on
Commit
36880bf
·
verified ·
1 Parent(s): f38ad3d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -2
README.md CHANGED
@@ -95,9 +95,10 @@ lm_eval --model hf --model_args pretrained=jerryzh168/phi4-mini-float8dq --tasks
95
 
96
  # Model Performance
97
 
98
- # Install latest vllm to get the most recent changes
99
  ```
100
- pip install git+https://github.com/vllm-project/vllm.git
 
101
  ```
102
 
103
  # Download dataset
@@ -105,6 +106,9 @@ Download sharegpt dataset: `wget https://huggingface.co/datasets/anon8231489123/
105
 
106
  Other datasets can be found in: https://github.com/vllm-project/vllm/tree/main/benchmarks
107
  # benchmark_latency
 
 
 
108
  ## baseline
109
  ```
110
  python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model microsoft/Phi-4-mini-instruct --batch-size 1
@@ -119,6 +123,8 @@ python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model
119
 
120
  We also benchmarked the throughput in a serving environment.
121
 
 
 
122
  ## baseline
123
  Server:
124
  ```
 
95
 
96
  # Model Performance
97
 
98
+ # Download vllm source code and install vllm
99
  ```
100
+ git clone git@github.com:vllm-project/vllm.git
101
+ VLLM_USE_PRECOMPILED=1 pip install .
102
  ```
103
 
104
  # Download dataset
 
106
 
107
  Other datasets can be found in: https://github.com/vllm-project/vllm/tree/main/benchmarks
108
  # benchmark_latency
109
+
110
+ Run the following under `vllm` source code root folder:
111
+
112
  ## baseline
113
  ```
114
  python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model microsoft/Phi-4-mini-instruct --batch-size 1
 
123
 
124
  We also benchmarked the throughput in a serving environment.
125
 
126
+ Run the following under `vllm` source code root folder:
127
+
128
  ## baseline
129
  Server:
130
  ```