This is Qwen/Qwen3-4B quantized with LLM Compressor in 4-bit (NVFP4), weights and activations. The calibration step used 512 samples of 16000 tokens, chat template applied, from open-r1/OpenR1-Math-220k.

The quantization has been done, tested, and evaluated by The Kaitchup. The model is compatible with vLLM. Use a Blackwell GPU to get >2x throughput.

More details in this article: NVFP4: Same Accuracy with 2.3x Higher Throughput for 4-Bit LLMs

How to Support My Work

Subscribe to The Kaitchup. Or, for a one-time contribution, here is my ko-fi link: https://ko-fi.com/bnjmn_marie

This helps me a lot to continue quantizing and evaluating models for free.

Downloads last month
10
Safetensors
Model size
2.82B params
Tensor type
F32
·
BF16
·
F8_E4M3
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kaitchup/Qwen3-4B-calib-OpenR1-Math-220k-16klen-NVFP4

Base model

Qwen/Qwen3-4B-Base
Finetuned
Qwen/Qwen3-4B
Quantized
(155)
this model

Dataset used to train kaitchup/Qwen3-4B-calib-OpenR1-Math-220k-16klen-NVFP4

Collection including kaitchup/Qwen3-4B-calib-OpenR1-Math-220k-16klen-NVFP4