Update README.md
Browse files
README.md
CHANGED
@@ -24,11 +24,6 @@ Need to install vllm nightly to get some recent changes:
|
|
24 |
```
|
25 |
pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
|
26 |
```
|
27 |
-
## Command Line
|
28 |
-
Then we can serve with the following command:
|
29 |
-
```
|
30 |
-
vllm serve pytorch/Phi-4-mini-instruct-int4wo-hqq --tokenizer microsoft/Phi-4-mini-instruct -O3
|
31 |
-
```
|
32 |
|
33 |
## Code Example
|
34 |
```
|
@@ -52,6 +47,12 @@ output = llm.chat(messages=messages, sampling_params=sampling_params)
|
|
52 |
print(output[0].outputs[0].text)
|
53 |
```
|
54 |
|
|
|
|
|
|
|
|
|
|
|
|
|
55 |
# Inference with Transformers
|
56 |
|
57 |
Install the required packages:
|
@@ -162,18 +163,6 @@ output_text = tokenizer.batch_decode(
|
|
162 |
print("Response:", output_text[0][len(prompt):])
|
163 |
```
|
164 |
|
165 |
-
# Serving with vllm
|
166 |
-
|
167 |
-
Need to install vllm nightly to get some recent changes:
|
168 |
-
```
|
169 |
-
pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
|
170 |
-
```
|
171 |
-
|
172 |
-
Then we can serve with the following command:
|
173 |
-
```
|
174 |
-
vllm serve pytorch/Phi-4-mini-instruct-float8dq --tokenizer microsoft/Phi-4-mini-instruct -O3
|
175 |
-
```
|
176 |
-
|
177 |
# Model Quality
|
178 |
We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate the quality of the quantized model.
|
179 |
Need to install lm-eval from source:
|
|
|
24 |
```
|
25 |
pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
|
26 |
```
|
|
|
|
|
|
|
|
|
|
|
27 |
|
28 |
## Code Example
|
29 |
```
|
|
|
47 |
print(output[0].outputs[0].text)
|
48 |
```
|
49 |
|
50 |
+
## Serving
|
51 |
+
Then we can serve with the following command:
|
52 |
+
```
|
53 |
+
vllm serve pytorch/Phi-4-mini-instruct-int4wo-hqq --tokenizer microsoft/Phi-4-mini-instruct -O3
|
54 |
+
```
|
55 |
+
|
56 |
# Inference with Transformers
|
57 |
|
58 |
Install the required packages:
|
|
|
163 |
print("Response:", output_text[0][len(prompt):])
|
164 |
```
|
165 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
166 |
# Model Quality
|
167 |
We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate the quality of the quantized model.
|
168 |
Need to install lm-eval from source:
|