Spaces:

optimum
/

llm-perf-leaderboard

Running

jerryzh168 commited on Dec 7, 2024

Commit

3795233

1 Parent(s): 51a4daf

Add torchao int4 weight only quantization as an option

Summary:
This is follow up of https://github.com/huggingface/optimum-benchmark/pull/297/
to make torchao available as an option on leader board.

We want to add torchao.autoquant as an option in the end, after we integrate it
to TorchAoConfig later.

Test Plan:
leaderboard?

Reviewers:

Subscribers:

Tasks:

Tags:

Files changed (3) hide show

hardware.yaml +2 -1
src/panel.py +1 -1
src/utils.py +5 -0

hardware.yaml CHANGED Viewed

@@ -19,6 +19,7 @@
     - awq
     - bnb
     - gptq
   backends:
     - pytorch
@@ -45,4 +46,4 @@
   backends:
     - pytorch
     - openvino
-    - onnxruntime

     - awq
     - bnb
     - gptq
+    - torchao
   backends:
     - pytorch
   backends:
     - pytorch
     - openvino
+    - onnxruntime

src/panel.py CHANGED Viewed

@@ -26,7 +26,7 @@ def create_control_panel(
     if hardware_provider == "nvidia":
         backends = ["pytorch"]
         attention_implementations = ["Eager", "SDPA", "FAv2"]
-        quantizations = ["Unquantized", "BnB.4bit", "BnB.8bit", "AWQ.4bit", "GPTQ.4bit"]
         kernels = [
             "No Kernel",
             "GPTQ.ExllamaV1",

     if hardware_provider == "nvidia":
         backends = ["pytorch"]
         attention_implementations = ["Eager", "SDPA", "FAv2"]
+        quantizations = ["Unquantized", "BnB.4bit", "BnB.8bit", "AWQ.4bit", "GPTQ.4bit", "torchao.4bit"]
         kernels = [
             "No Kernel",
             "GPTQ.ExllamaV1",

src/utils.py CHANGED Viewed

@@ -70,6 +70,11 @@ def process_quantizations(x):
         and x["config.backend.quantization_config.bits"] == 4
     ):
         return "AWQ.4bit"
     else:
         return "Unquantized"

         and x["config.backend.quantization_config.bits"] == 4
     ):
         return "AWQ.4bit"
+    elif (
+            x["config.backend.quantization_scheme"] == "torchao"
+            and x["config.backend.quantization_config.quant_type"] == "int4_weight_only"
+    ):
+        return "torchao.4bit"
     else:
         return "Unquantized"