|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- HuggingFaceH4/ultrachat_200k |
|
language: |
|
- en |
|
- es |
|
base_model: |
|
- Qwen/Qwen3-Embedding-0.6B |
|
pipeline_tag: feature-extraction |
|
--- |
|
|
|
# prudant/Qwen3-Embedding-0.6B-W8A8 |
|
|
|
This is a compressed version of Qwen/Qwen3-Embedding-0.6B using llm-compressor with the following scheme: W8A8 |
|
|
|
**Important**: You MUST read the following guide for correct usage of this model here [Guide](https://github.com/vllm-project/vllm/pull/19260) |
|
|
|
## Model Details |
|
|
|
- **Original Model**: Qwen/Qwen3-Embedding-0.6B |
|
- **Quantization Method**: GPTQ |
|
- **Compression Libraries**: [llm-compressor](https://github.com/vllm-project/llm-compressor) |
|
- **Calibration Dataset**: ultrachat_200k (1024 samples) |
|
- **Optimized For**: Inference with vLLM |
|
- **License**: same as original model |