prudant/Qwen3-Embedding-4B-W4A16_ASYM

Original Model: Qwen/Qwen3-Embedding-4B
Quantization Method: AWQ
Compression Libraries: llm-compressor
Calibration Dataset: HuggingFaceH4/ultrachat_200k (1024 samples)
Optimized For: Inference with vLLM
License: same as original model

This is a compressed version of Qwen/Qwen3-Embedding-4B using llm-compressor with the following scheme: W4A16_ASYM

Important: You MUST read the following guide for correct usage of this model here Guide

(Check pooling configuration in VLLM and best pooling mode for this model)