cpatonn
/

OpenReasoning-Nemotron-32B-W8A8-INT8-Dynamic

8-bit precision

compressed-tensors

Model card Files Files and versions

OpenReasoning-Nemotron-32B-W8A8-INT8-Dynamic / README.md

cpatonn's picture

Update README.md

39de9ad verified about 2 months ago

|

history blame contribute delete

440 Bytes

	---
	license: apache-2.0
	base_model:
	- nvidia/OpenReasoning-Nemotron-32B
	datasets:
	- HuggingFaceH4/ultrachat_200k
	---
	# OpenReasoning-Nemotron-32B-W8A8-INT8-Dynamic

	## Method
	Quantised using [vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor.git) and the following configs:
	```
	recipe = [
	SmoothQuantModifier(smoothing_strength=0.8),
	GPTQModifier(targets="Linear", scheme="W8A8", ignore=["lm_head"]),
	]
	```