Qwen3 Embedding&Reranker GPTQ
Collection
6 items
•
Updated
GPTQ Quantized https://huggingface.co/Qwen/Qwen3-Embedding-8B with THUIR/T2Ranking and m-a-p/COIG-CQIA for calibration set.
VRAM Usage: more than 24G -> 19624M
, make it available on 3090/4090. (w/o FA2)
~0.81% lost in C-MTEB.
C-MTEB | Param. | Mean(Task) | Mean(Type) | Class. | Clust. | Pair Class. | Rerank. | Retr. | STS |
---|---|---|---|---|---|---|---|---|---|
multilingual-e5-large-instruct | 0.6B | 58.08 | 58.24 | 69.80 | 48.23 | 64.52 | 57.45 | 63.65 | 45.81 |
bge-multilingual-gemma2 | 9B | 67.64 | 75.31 | 59.30 | 86.67 | 68.28 | 73.73 | 55.19 | - |
gte-Qwen2-1.5B-instruct | 1.5B | 67.12 | 67.79 | 72.53 | 54.61 | 79.5 | 68.21 | 71.86 | 60.05 |
gte-Qwen2-7B-instruct | 7.6B | 71.62 | 72.19 | 75.77 | 66.06 | 81.16 | 69.24 | 75.70 | 65.20 |
ritrieve_zh_v1 | 0.3B | 72.71 | 73.85 | 76.88 | 66.5 | 85.98 | 72.86 | 76.97 | 63.92 |
Qwen3-Embedding-8B | 8B | 73.84 | 75.00 | 76.97 | 80.08 | 84.23 | 66.99 | 78.21 | 63.53 |
This Model | 8B-W4A16 | 73.24 | 74.38 | 76.85 | 79.58 | 83.21 | 66.43 | 77.39 | 62.80 |
pip install compressed-tensors optimum
and auto-gptq
/ gptqmodel
, then goto the official usage guide.