Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -24,11 +24,6 @@ Need to install vllm nightly to get some recent changes:
 ```
 pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
 ```
-## Command Line
-Then we can serve with the following command:
-```
-vllm serve pytorch/Phi-4-mini-instruct-int4wo-hqq --tokenizer microsoft/Phi-4-mini-instruct -O3
-```
 ## Code Example
 ```
@@ -52,6 +47,12 @@ output = llm.chat(messages=messages, sampling_params=sampling_params)
 print(output[0].outputs[0].text)
 ```
 # Inference with Transformers
 Install the required packages:
@@ -162,18 +163,6 @@ output_text = tokenizer.batch_decode(
 print("Response:", output_text[0][len(prompt):])
 ```
-# Serving with vllm
-Need to install vllm nightly to get some recent changes:
-```
-pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
-```
-Then we can serve with the following command:
-```
-vllm serve pytorch/Phi-4-mini-instruct-float8dq --tokenizer microsoft/Phi-4-mini-instruct -O3
-```
 # Model Quality
 We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate the quality of the quantized model.
 Need to install lm-eval from source:

 ```
 pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
 ```
 ## Code Example
 ```
 print(output[0].outputs[0].text)
 ```
+## Serving
+Then we can serve with the following command:
+```
+vllm serve pytorch/Phi-4-mini-instruct-int4wo-hqq --tokenizer microsoft/Phi-4-mini-instruct -O3
+```
 # Inference with Transformers
 Install the required packages:
 print("Response:", output_text[0][len(prompt):])
 ```
 # Model Quality
 We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate the quality of the quantized model.
 Need to install lm-eval from source: