Update README.md
Browse files
README.md
CHANGED
@@ -113,7 +113,7 @@ lm_eval --model hf --model_args pretrained=microsoft/Phi-4-mini-instruct --tasks
|
|
113 |
|
114 |
## float8dq
|
115 |
```
|
116 |
-
lm_eval --model hf --model_args pretrained=
|
117 |
```
|
118 |
|
119 |
`TODO: more complete eval results`
|
@@ -163,7 +163,7 @@ python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model
|
|
163 |
|
164 |
### float8dq
|
165 |
```
|
166 |
-
python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model
|
167 |
```
|
168 |
|
169 |
## benchmark_serving
|
@@ -186,7 +186,7 @@ python benchmarks/benchmark_serving.py --backend vllm --dataset-name sharegpt --
|
|
186 |
### float8dq
|
187 |
Server:
|
188 |
```
|
189 |
-
vllm serve
|
190 |
```
|
191 |
|
192 |
Client:
|
@@ -197,5 +197,5 @@ python benchmarks/benchmark_serving.py --backend vllm --dataset-name sharegpt --
|
|
197 |
# Serving with vllm
|
198 |
We can use the same command we used in serving benchmarks to serve the model with vllm
|
199 |
```
|
200 |
-
vllm serve
|
201 |
```
|
|
|
113 |
|
114 |
## float8dq
|
115 |
```
|
116 |
+
lm_eval --model hf --model_args pretrained=pytorch/Phi-4-mini-instruct-float8dq --tasks hellaswag --device cuda:0 --batch_size 8
|
117 |
```
|
118 |
|
119 |
`TODO: more complete eval results`
|
|
|
163 |
|
164 |
### float8dq
|
165 |
```
|
166 |
+
python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model pytorch/Phi-4-mini-instruct-float8dq --batch-size 1
|
167 |
```
|
168 |
|
169 |
## benchmark_serving
|
|
|
186 |
### float8dq
|
187 |
Server:
|
188 |
```
|
189 |
+
vllm serve pytorch/Phi-4-mini-instruct-float8dq --tokenizer microsoft/Phi-4-mini-instruct -O3
|
190 |
```
|
191 |
|
192 |
Client:
|
|
|
197 |
# Serving with vllm
|
198 |
We can use the same command we used in serving benchmarks to serve the model with vllm
|
199 |
```
|
200 |
+
vllm serve pytorch/Phi-4-mini-instruct-float8dq --tokenizer microsoft/Phi-4-mini-instruct -O3
|
201 |
```
|