Update README.md
Browse files
README.md
CHANGED
|
@@ -24,11 +24,6 @@ Need to install vllm nightly to get some recent changes:
|
|
| 24 |
```
|
| 25 |
pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
|
| 26 |
```
|
| 27 |
-
## Command Line
|
| 28 |
-
Then we can serve with the following command:
|
| 29 |
-
```
|
| 30 |
-
vllm serve pytorch/Phi-4-mini-instruct-int4wo-hqq --tokenizer microsoft/Phi-4-mini-instruct -O3
|
| 31 |
-
```
|
| 32 |
|
| 33 |
## Code Example
|
| 34 |
```
|
|
@@ -52,6 +47,12 @@ output = llm.chat(messages=messages, sampling_params=sampling_params)
|
|
| 52 |
print(output[0].outputs[0].text)
|
| 53 |
```
|
| 54 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
# Inference with Transformers
|
| 56 |
|
| 57 |
Install the required packages:
|
|
@@ -162,18 +163,6 @@ output_text = tokenizer.batch_decode(
|
|
| 162 |
print("Response:", output_text[0][len(prompt):])
|
| 163 |
```
|
| 164 |
|
| 165 |
-
# Serving with vllm
|
| 166 |
-
|
| 167 |
-
Need to install vllm nightly to get some recent changes:
|
| 168 |
-
```
|
| 169 |
-
pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
|
| 170 |
-
```
|
| 171 |
-
|
| 172 |
-
Then we can serve with the following command:
|
| 173 |
-
```
|
| 174 |
-
vllm serve pytorch/Phi-4-mini-instruct-float8dq --tokenizer microsoft/Phi-4-mini-instruct -O3
|
| 175 |
-
```
|
| 176 |
-
|
| 177 |
# Model Quality
|
| 178 |
We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate the quality of the quantized model.
|
| 179 |
Need to install lm-eval from source:
|
|
|
|
| 24 |
```
|
| 25 |
pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
|
| 26 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
|
| 28 |
## Code Example
|
| 29 |
```
|
|
|
|
| 47 |
print(output[0].outputs[0].text)
|
| 48 |
```
|
| 49 |
|
| 50 |
+
## Serving
|
| 51 |
+
Then we can serve with the following command:
|
| 52 |
+
```
|
| 53 |
+
vllm serve pytorch/Phi-4-mini-instruct-int4wo-hqq --tokenizer microsoft/Phi-4-mini-instruct -O3
|
| 54 |
+
```
|
| 55 |
+
|
| 56 |
# Inference with Transformers
|
| 57 |
|
| 58 |
Install the required packages:
|
|
|
|
| 163 |
print("Response:", output_text[0][len(prompt):])
|
| 164 |
```
|
| 165 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 166 |
# Model Quality
|
| 167 |
We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate the quality of the quantized model.
|
| 168 |
Need to install lm-eval from source:
|