jerryzh168 commited on
Commit
f38ad3d
·
verified ·
1 Parent(s): 8f04138

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -4
README.md CHANGED
@@ -1,6 +1,8 @@
1
  ---
2
  library_name: transformers
3
- tags: []
 
 
4
  ---
5
 
6
  [Phi4-mini](https://huggingface.co/microsoft/Phi-4-mini-instruct) model quantized with [torchao](https://huggingface.co/docs/transformers/main/en/quantization/torchao) float8 dynamic activation and float8 weight quantization (per row granularity), by PyTorch team.
@@ -21,7 +23,7 @@ quantized_model = AutoModelForCausalLM.from_pretrained(model_id, device_map="aut
21
 
22
  # Push to hub
23
  USER_ID = "YOUR_USER_ID"
24
- save_to = "{USER_ID}/{model_id}-int4wo"
25
  quantized_model.push_to_hub(save_to, safe_serialization=False)
26
 
27
 
@@ -136,11 +138,11 @@ vllm serve jerryzh168/phi4-mini-float8dq --tokenizer microsoft/Phi-4-mini-instru
136
 
137
  Client:
138
  ```
139
- python benchmarks/benchmark_serving.py --backend vllm --dataset-name sharegpt --tokenizer microsoft/Phi-4-mini-instruct --dataset-path ./ShareGPT_V3_unfiltered_cleaned_split.json --model jerryzh168/phi4-mini-int4wo-hqq --num-prompts 1
140
  ```
141
 
142
  # Serving with vllm
143
  We can use the same command we used in serving benchmarks to serve the model with vllm
144
  ```
145
  vllm serve jerryzh168/phi4-mini-float8dq --tokenizer microsoft/Phi-4-mini-instruct -O3
146
- ```
 
1
  ---
2
  library_name: transformers
3
+ tags:
4
+ - torchao
5
+ license: mit
6
  ---
7
 
8
  [Phi4-mini](https://huggingface.co/microsoft/Phi-4-mini-instruct) model quantized with [torchao](https://huggingface.co/docs/transformers/main/en/quantization/torchao) float8 dynamic activation and float8 weight quantization (per row granularity), by PyTorch team.
 
23
 
24
  # Push to hub
25
  USER_ID = "YOUR_USER_ID"
26
+ save_to = "{USER_ID}/{model_id}-float8dq"
27
  quantized_model.push_to_hub(save_to, safe_serialization=False)
28
 
29
 
 
138
 
139
  Client:
140
  ```
141
+ python benchmarks/benchmark_serving.py --backend vllm --dataset-name sharegpt --tokenizer microsoft/Phi-4-mini-instruct --dataset-path ./ShareGPT_V3_unfiltered_cleaned_split.json --model jerryzh168/phi4-mini-float8dq --num-prompts 1
142
  ```
143
 
144
  # Serving with vllm
145
  We can use the same command we used in serving benchmarks to serve the model with vllm
146
  ```
147
  vllm serve jerryzh168/phi4-mini-float8dq --tokenizer microsoft/Phi-4-mini-instruct -O3
148
+ ```