prudant/Qwen3-Embedding-4B-W4A16_ASYM
This is a compressed version of Qwen/Qwen3-Embedding-4B using llm-compressor with the following scheme: W4A16_ASYM
Important: You MUST read the following guide for correct usage of this model here Guide
(Check pooling configuration in VLLM and best pooling mode for this model)
Model Details
- Original Model: Qwen/Qwen3-Embedding-4B
- Quantization Method: AWQ
- Compression Libraries: llm-compressor
- Calibration Dataset: HuggingFaceH4/ultrachat_200k (1024 samples)
- Optimized For: Inference with vLLM
- License: same as original model
- Downloads last month
- 46