jerryzh168 commited on
Commit
926d6d3
·
verified ·
1 Parent(s): 7d5e958

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -17
README.md CHANGED
@@ -24,11 +24,6 @@ Need to install vllm nightly to get some recent changes:
24
  ```
25
  pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
26
  ```
27
- ## Command Line
28
- Then we can serve with the following command:
29
- ```
30
- vllm serve pytorch/Phi-4-mini-instruct-int4wo-hqq --tokenizer microsoft/Phi-4-mini-instruct -O3
31
- ```
32
 
33
  ## Code Example
34
  ```
@@ -52,6 +47,12 @@ output = llm.chat(messages=messages, sampling_params=sampling_params)
52
  print(output[0].outputs[0].text)
53
  ```
54
 
 
 
 
 
 
 
55
  # Inference with Transformers
56
 
57
  Install the required packages:
@@ -162,18 +163,6 @@ output_text = tokenizer.batch_decode(
162
  print("Response:", output_text[0][len(prompt):])
163
  ```
164
 
165
- # Serving with vllm
166
-
167
- Need to install vllm nightly to get some recent changes:
168
- ```
169
- pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
170
- ```
171
-
172
- Then we can serve with the following command:
173
- ```
174
- vllm serve pytorch/Phi-4-mini-instruct-float8dq --tokenizer microsoft/Phi-4-mini-instruct -O3
175
- ```
176
-
177
  # Model Quality
178
  We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate the quality of the quantized model.
179
  Need to install lm-eval from source:
 
24
  ```
25
  pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
26
  ```
 
 
 
 
 
27
 
28
  ## Code Example
29
  ```
 
47
  print(output[0].outputs[0].text)
48
  ```
49
 
50
+ ## Serving
51
+ Then we can serve with the following command:
52
+ ```
53
+ vllm serve pytorch/Phi-4-mini-instruct-int4wo-hqq --tokenizer microsoft/Phi-4-mini-instruct -O3
54
+ ```
55
+
56
  # Inference with Transformers
57
 
58
  Install the required packages:
 
163
  print("Response:", output_text[0][len(prompt):])
164
  ```
165
 
 
 
 
 
 
 
 
 
 
 
 
 
166
  # Model Quality
167
  We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate the quality of the quantized model.
168
  Need to install lm-eval from source: