cpatonn
/

OpenReasoning-Nemotron-32B-W8A8-INT8-Dynamic

8-bit precision

compressed-tensors

Model card Files Files and versions

cpatonn commited on Jul 19

Commit

3fa2e79

·

verified ·

1 Parent(s): 03892c1

Update README.md

Files changed (1) hide show

README.md +16 -3

README.md CHANGED Viewed

@@ -1,3 +1,16 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+base_model:
+- nvidia/OpenReasoning-Nemotron-32B
+datasets:
+- HuggingFaceH4/ultrachat_200k
+---
+# Method
+Quantised using [vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor.git) and the following configs:
+```
+recipe = [
+    SmoothQuantModifier(smoothing_strength=0.8),
+    GPTQModifier(targets="Linear", scheme="W8A8", ignore=["lm_head"]),
+]
+```