File size: 1,672 Bytes
0c2b7f7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
license: apache-2.0
base_model: Qwen/Qwen3-Embedding-0.6B
tags:
- qwen3
- text-embeddings-inference
- onnx
- sentence-transformers
- feature-extraction
- sentence-similarity
language:
- multilingual
pipeline_tag: sentence-similarity
library_name: sentence-transformers
---

# Qwen3-Embedding-0.6B ONNX for TEI

This is an ONNX version of [Qwen/Qwen3-Embedding-0.6B](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B) optimized for [Text Embeddings Inference (TEI)](https://github.com/huggingface/text-embeddings-inference).

## Model Details

- **Base Model**: Qwen/Qwen3-Embedding-0.6B
- **Format**: ONNX with external data (`model.onnx` + `model.onnx_data`)
- **Pooling**: Mean pooling (built into the ONNX graph)
- **Embedding Dimension**: 1024
- **Max Sequence Length**: 32768 tokens

## Usage with TEI

```bash
docker run --gpus all -p 8080:80 -v $PWD:/data \
  ghcr.io/huggingface/text-embeddings-inference:latest \
  --model-id janni-t/qwen3-embedding-0.6b-tei-onnx
```

For CPU inference:
```bash
docker run -p 8080:80 -v $PWD:/data \
  ghcr.io/huggingface/text-embeddings-inference:cpu-latest \
  --model-id janni-t/qwen3-embedding-0.6b-tei-onnx
```

## Conversion Details

This model was converted from the original PyTorch model to ONNX format with:
- Consolidated external data for TEI compatibility
- Mean pooling integrated into the ONNX graph
- Optimized for CPU inference

## Original Model

See the original model card at [Qwen/Qwen3-Embedding-0.6B](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B) for:
- Model architecture details
- Training information
- Benchmark results
- Citation information

## License

Apache 2.0 (same as the original model)