prudant/Qwen3-Embedding-4B-W4A16_ASYM

This is a compressed version of Qwen/Qwen3-Embedding-4B using llm-compressor with the following scheme: W4A16_ASYM

Important: You MUST read the following guide for correct usage of this model here Guide

(Check pooling configuration in VLLM and best pooling mode for this model)

Model Details

  • Original Model: Qwen/Qwen3-Embedding-4B
  • Quantization Method: AWQ
  • Compression Libraries: llm-compressor
  • Calibration Dataset: HuggingFaceH4/ultrachat_200k (1024 samples)
  • Optimized For: Inference with vLLM
  • License: same as original model
Downloads last month
46
Safetensors
Model size
1.26B params
Tensor type
I64
I32
BF16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for dolfsai/Qwen3-Embedding-4B-W4A16_ASYM

Base model

Qwen/Qwen3-4B-Base
Quantized
(6)
this model