Update README.md
Browse files
README.md
CHANGED
@@ -1,6 +1,8 @@
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
-
tags:
|
|
|
|
|
4 |
---
|
5 |
|
6 |
[Phi4-mini](https://huggingface.co/microsoft/Phi-4-mini-instruct) model quantized with [torchao](https://huggingface.co/docs/transformers/main/en/quantization/torchao) float8 dynamic activation and float8 weight quantization (per row granularity), by PyTorch team.
|
@@ -21,7 +23,7 @@ quantized_model = AutoModelForCausalLM.from_pretrained(model_id, device_map="aut
|
|
21 |
|
22 |
# Push to hub
|
23 |
USER_ID = "YOUR_USER_ID"
|
24 |
-
save_to = "{USER_ID}/{model_id}-
|
25 |
quantized_model.push_to_hub(save_to, safe_serialization=False)
|
26 |
|
27 |
|
@@ -136,11 +138,11 @@ vllm serve jerryzh168/phi4-mini-float8dq --tokenizer microsoft/Phi-4-mini-instru
|
|
136 |
|
137 |
Client:
|
138 |
```
|
139 |
-
python benchmarks/benchmark_serving.py --backend vllm --dataset-name sharegpt --tokenizer microsoft/Phi-4-mini-instruct --dataset-path ./ShareGPT_V3_unfiltered_cleaned_split.json --model jerryzh168/phi4-mini-
|
140 |
```
|
141 |
|
142 |
# Serving with vllm
|
143 |
We can use the same command we used in serving benchmarks to serve the model with vllm
|
144 |
```
|
145 |
vllm serve jerryzh168/phi4-mini-float8dq --tokenizer microsoft/Phi-4-mini-instruct -O3
|
146 |
-
```
|
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
+
tags:
|
4 |
+
- torchao
|
5 |
+
license: mit
|
6 |
---
|
7 |
|
8 |
[Phi4-mini](https://huggingface.co/microsoft/Phi-4-mini-instruct) model quantized with [torchao](https://huggingface.co/docs/transformers/main/en/quantization/torchao) float8 dynamic activation and float8 weight quantization (per row granularity), by PyTorch team.
|
|
|
23 |
|
24 |
# Push to hub
|
25 |
USER_ID = "YOUR_USER_ID"
|
26 |
+
save_to = "{USER_ID}/{model_id}-float8dq"
|
27 |
quantized_model.push_to_hub(save_to, safe_serialization=False)
|
28 |
|
29 |
|
|
|
138 |
|
139 |
Client:
|
140 |
```
|
141 |
+
python benchmarks/benchmark_serving.py --backend vllm --dataset-name sharegpt --tokenizer microsoft/Phi-4-mini-instruct --dataset-path ./ShareGPT_V3_unfiltered_cleaned_split.json --model jerryzh168/phi4-mini-float8dq --num-prompts 1
|
142 |
```
|
143 |
|
144 |
# Serving with vllm
|
145 |
We can use the same command we used in serving benchmarks to serve the model with vllm
|
146 |
```
|
147 |
vllm serve jerryzh168/phi4-mini-float8dq --tokenizer microsoft/Phi-4-mini-instruct -O3
|
148 |
+
```
|