Qwen3-Embedding-8B-W4A16-G128

What's the benefit?

VRAM Usage: more than 24G -> 19624M, make it available on 3090/4090. (w/o FA2)

~0.81% lost in C-MTEB.

C-MTEB	Param.	Mean(Task)	Mean(Type)	Class.	Clust.	Pair Class.	Rerank.	Retr.	STS
multilingual-e5-large-instruct	0.6B	58.08	58.24	69.80	48.23	64.52	57.45	63.65	45.81
bge-multilingual-gemma2	9B	67.64	75.31	59.30	86.67	68.28	73.73	55.19	-
gte-Qwen2-1.5B-instruct	1.5B	67.12	67.79	72.53	54.61	79.5	68.21	71.86	60.05
gte-Qwen2-7B-instruct	7.6B	71.62	72.19	75.77	66.06	81.16	69.24	75.70	65.20
ritrieve_zh_v1	0.3B	72.71	73.85	76.88	66.5	85.98	72.86	76.97	63.92
Qwen3-Embedding-8B	8B	73.84	75.00	76.97	80.08	84.23	66.99	78.21	63.53
This Model	8B-W4A16	73.24	74.38	76.85	79.58	83.21	66.43	77.39	62.80

pip install compressed-tensors optimum and auto-gptq / gptqmodel, then goto the official usage guide.