license: apache-2.0 | |
base_model: | |
- nvidia/OpenReasoning-Nemotron-32B | |
datasets: | |
- HuggingFaceH4/ultrachat_200k | |
# OpenReasoning-Nemotron-32B-W8A8-INT8-Dynamic | |
## Method | |
Quantised using [vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor.git) and the following configs: | |
``` | |
recipe = [ | |
SmoothQuantModifier(smoothing_strength=0.8), | |
GPTQModifier(targets="Linear", scheme="W8A8", ignore=["lm_head"]), | |
] | |
``` |