Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,6 +1,8 @@
 ---
 library_name: transformers
-tags: []
 ---
 [Phi4-mini](https://huggingface.co/microsoft/Phi-4-mini-instruct) model quantized with [torchao](https://huggingface.co/docs/transformers/main/en/quantization/torchao) float8 dynamic activation and float8 weight quantization (per row granularity), by PyTorch team.
@@ -21,7 +23,7 @@ quantized_model = AutoModelForCausalLM.from_pretrained(model_id, device_map="aut
 # Push to hub
 USER_ID = "YOUR_USER_ID"
-save_to = "{USER_ID}/{model_id}-int4wo"
 quantized_model.push_to_hub(save_to, safe_serialization=False)
@@ -136,11 +138,11 @@ vllm serve jerryzh168/phi4-mini-float8dq --tokenizer microsoft/Phi-4-mini-instru
 Client:
 ```
-python benchmarks/benchmark_serving.py --backend vllm --dataset-name sharegpt --tokenizer microsoft/Phi-4-mini-instruct --dataset-path ./ShareGPT_V3_unfiltered_cleaned_split.json --model jerryzh168/phi4-mini-int4wo-hqq --num-prompts 1
 ```
 # Serving with vllm
 We can use the same command we used in serving benchmarks to serve the model with vllm
 ```
 vllm serve jerryzh168/phi4-mini-float8dq --tokenizer microsoft/Phi-4-mini-instruct -O3
-```

 ---
 library_name: transformers
+tags:
+- torchao
+license: mit
 ---
 [Phi4-mini](https://huggingface.co/microsoft/Phi-4-mini-instruct) model quantized with [torchao](https://huggingface.co/docs/transformers/main/en/quantization/torchao) float8 dynamic activation and float8 weight quantization (per row granularity), by PyTorch team.
 # Push to hub
 USER_ID = "YOUR_USER_ID"
+save_to = "{USER_ID}/{model_id}-float8dq"
 quantized_model.push_to_hub(save_to, safe_serialization=False)
 Client:
 ```
+python benchmarks/benchmark_serving.py --backend vllm --dataset-name sharegpt --tokenizer microsoft/Phi-4-mini-instruct --dataset-path ./ShareGPT_V3_unfiltered_cleaned_split.json --model jerryzh168/phi4-mini-float8dq --num-prompts 1
 ```
 # Serving with vllm
 We can use the same command we used in serving benchmarks to serve the model with vllm
 ```
 vllm serve jerryzh168/phi4-mini-float8dq --tokenizer microsoft/Phi-4-mini-instruct -O3
+```