dolfsai
/

Qwen3-Embedding-0.6B-vllm-W8A8

Feature Extraction

8-bit precision

compressed-tensors

Model card Files Files and versions

Qwen3-Embedding-0.6B-vllm-W8A8 / README.md

prudant's picture

Update README.md

e4f23fb verified 23 days ago

|

history blame contribute delete

783 Bytes

	---
	license: apache-2.0
	datasets:
	- HuggingFaceH4/ultrachat_200k
	language:
	- en
	- es
	base_model:
	- Qwen/Qwen3-Embedding-0.6B
	pipeline_tag: feature-extraction
	---

	# prudant/Qwen3-Embedding-0.6B-W8A8

	This is a compressed version of Qwen/Qwen3-Embedding-0.6B using llm-compressor with the following scheme: W8A8

	Important: You MUST read the following guide for correct usage of this model here [Guide](https://github.com/vllm-project/vllm/pull/19260)

	## Model Details

	- Original Model: Qwen/Qwen3-Embedding-0.6B
	- Quantization Method: GPTQ
	- Compression Libraries: [llm-compressor](https://github.com/vllm-project/llm-compressor)
	- Calibration Dataset: ultrachat_200k (1024 samples)
	- Optimized For: Inference with vLLM
	- License: same as original model