prudant/Qwen3-Reranker-4B-seq-cls-vllm-fixed-W4A16_ASYM

This is a compressed version of danielchalef/Qwen3-Reranker-4B-seq-cls-vllm-fixed using llm-compressor with the following scheme: W4A16_ASYM

Serving

python3 -m vllm.entrypoints.openai.api_server --model 'dolfsai/Qwen3-Reranker-4B-seq-cls-vllm-W4A16_ASYM' --task classify

Important: You MUST read the following guide for correct usage of this model here Guide

(Check pooling configuration in VLLM and best pooling mode for this model)

Model Details

  • Original Model: danielchalef/Qwen3-Reranker-4B-seq-cls-vllm-fixed
  • Quantization Method: AWQ
  • Compression Libraries: llm-compressor
  • Calibration Dataset: ultrachat_200k (512 samples)
  • Optimized For: Inference with vLLM
  • License: same as original model
Downloads last month
35
Safetensors
Model size
875M params
Tensor type
I64
I32
BF16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for dolfsai/Qwen3-Reranker-4B-seq-cls-vllm-W4A16_ASYM

Base model

Qwen/Qwen3-4B-Base
Quantized
(18)
this model