Qwen3 Embedding&Reranker GPTQ
Collection
6 items
•
Updated
GPTQ Quantized Qwen/Qwen3-Reranker-0.6B with Ultrachat, THUIR/T2Ranking and m-a-p/COIG-CQIA for calibration set.
VRAM Usage: 3228M
-> 2124M
(w/o FA2, according to Embedding model's result).
I think <5%
accuracy, further evaluation on the way...
The Embedding one shows ~0.7%
.
pip install compressed-tensors optimum
and auto-gptq
/ gptqmodel
, then goto the official usage guide.