qingcheng-ai
/

QWQ-32B-FP8

Text Generation

text-generation-inference

Model card Files Files and versions

DrTangxc commited on Mar 21

Commit

63c1cdc

·

verified ·

1 Parent(s): 69e024b

Update model card

Files changed (1) hide show

README.md +26 -1

README.md CHANGED Viewed

@@ -6,4 +6,29 @@ base_model_relation: quantized
 library_name: transformers
 tags:
 - qwq
----

 library_name: transformers
 tags:
 - qwq
+- fp8
+---
+# Model Overview
+## Description
+FP8 Quantized QwQ-32B.
+## Evaluation
+The test results in the following table are based on the MMLU benchmark.
+In order to speed up the test, we prevent the model from generating too long thought chains, so the score may be different from that with longer thought chain.
+In our experiment, **the accuracy of the FP8 quantized version is almost the same as the BF16 version, and it can be used for faster inference.**
+| Data Format | MMLU Score |
+|:---|:---|
+| BF16 Official | 61.2 |
+| FP8 Quantized | 61.2 |
+| Q8_0 (INT8) | 59.1 |
+| AWQ (INT4) | 53.4 |
+## Model Card Contact
+[email protected]