Qwen3-Embedding-8B-W4A16-G128

GPTQ Quantized https://huggingface.co/Qwen/Qwen3-Embedding-8B with THUIR/T2Ranking and m-a-p/COIG-CQIA for calibration set.

What's the benefit?

VRAM Usage: more than 24G -> 19624M, make it available on 3090/4090. (w/o FA2)

What's the cost?

~0.81% lost in C-MTEB.

C-MTEB Param. Mean(Task) Mean(Type) Class. Clust. Pair Class. Rerank. Retr. STS
multilingual-e5-large-instruct 0.6B 58.08 58.24 69.80 48.23 64.52 57.45 63.65 45.81
bge-multilingual-gemma2 9B 67.64 75.31 59.30 86.67 68.28 73.73 55.19 -
gte-Qwen2-1.5B-instruct 1.5B 67.12 67.79 72.53 54.61 79.5 68.21 71.86 60.05
gte-Qwen2-7B-instruct 7.6B 71.62 72.19 75.77 66.06 81.16 69.24 75.70 65.20
ritrieve_zh_v1 0.3B 72.71 73.85 76.88 66.5 85.98 72.86 76.97 63.92
Qwen3-Embedding-8B 8B 73.84 75.00 76.97 80.08 84.23 66.99 78.21 63.53
This Model 8B-W4A16 73.24 74.38 76.85 79.58 83.21 66.43 77.39 62.80

How to use it?

pip install compressed-tensors optimum and auto-gptq / gptqmodel, then goto the official usage guide.

Downloads last month
143
Safetensors
Model size
2.17B params
Tensor type
BF16
·
I64
·
I32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for boboliu/Qwen3-Embedding-8B-W4A16-G128

Base model

Qwen/Qwen3-8B-Base
Quantized
(4)
this model

Collection including boboliu/Qwen3-Embedding-8B-W4A16-G128