jerryzh168 commited on
Commit
bf1e484
·
verified ·
1 Parent(s): 265080b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -113,7 +113,7 @@ lm_eval --model hf --model_args pretrained=microsoft/Phi-4-mini-instruct --tasks
113
 
114
  ## float8dq
115
  ```
116
- lm_eval --model hf --model_args pretrained=jerryzh168/phi4-mini-float8dq --tasks hellaswag --device cuda:0 --batch_size 8
117
  ```
118
 
119
  `TODO: more complete eval results`
@@ -163,7 +163,7 @@ python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model
163
 
164
  ### float8dq
165
  ```
166
- python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model jerryzh168/phi4-mini-float8dq --batch-size 1
167
  ```
168
 
169
  ## benchmark_serving
@@ -186,7 +186,7 @@ python benchmarks/benchmark_serving.py --backend vllm --dataset-name sharegpt --
186
  ### float8dq
187
  Server:
188
  ```
189
- vllm serve jerryzh168/phi4-mini-float8dq --tokenizer microsoft/Phi-4-mini-instruct -O3
190
  ```
191
 
192
  Client:
@@ -197,5 +197,5 @@ python benchmarks/benchmark_serving.py --backend vllm --dataset-name sharegpt --
197
  # Serving with vllm
198
  We can use the same command we used in serving benchmarks to serve the model with vllm
199
  ```
200
- vllm serve jerryzh168/phi4-mini-float8dq --tokenizer microsoft/Phi-4-mini-instruct -O3
201
  ```
 
113
 
114
  ## float8dq
115
  ```
116
+ lm_eval --model hf --model_args pretrained=pytorch/Phi-4-mini-instruct-float8dq --tasks hellaswag --device cuda:0 --batch_size 8
117
  ```
118
 
119
  `TODO: more complete eval results`
 
163
 
164
  ### float8dq
165
  ```
166
+ python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model pytorch/Phi-4-mini-instruct-float8dq --batch-size 1
167
  ```
168
 
169
  ## benchmark_serving
 
186
  ### float8dq
187
  Server:
188
  ```
189
+ vllm serve pytorch/Phi-4-mini-instruct-float8dq --tokenizer microsoft/Phi-4-mini-instruct -O3
190
  ```
191
 
192
  Client:
 
197
  # Serving with vllm
198
  We can use the same command we used in serving benchmarks to serve the model with vllm
199
  ```
200
+ vllm serve pytorch/Phi-4-mini-instruct-float8dq --tokenizer microsoft/Phi-4-mini-instruct -O3
201
  ```