Qwen/Qwen3-Embedding-0.6B-GGUF · Can it be used with sentence-transformers?

zrzakhan

5 days ago

Can it be used with/supported by sentence-transformers?

tomaarsen

5 days ago

I've worked on adding support in this pull request:

https://huggingface.co/Qwen/Qwen3-Embedding-0.6B/discussions/2

Until they merge it, you can use e.g.:

# Requires transformers>=4.51.0

from sentence_transformers import SentenceTransformer

# Load the model
model = SentenceTransformer("Qwen/Qwen3-Embedding-0.6B", revision="refs/pr/2")

# We recommend enabling flash_attention_2 for better acceleration and memory saving,
# together with setting `padding_side` to "left":
# model = SentenceTransformer(
#     "Qwen/Qwen3-Embedding-0.6B",
#     revision="refs/pr/2",
#     model_kwargs={"attn_implementation": "flash_attention_2", "device_map": "auto"},
#     tokenizer_kwargs={"padding_side": "left"},
# )

# The queries and documents to embed
queries = [
    "What is the capital of China?",
    "Explain gravity",
]
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]

# Encode the queries and documents. Note that queries benefit from using a prompt
# Here we use the prompt called "query" stored under `model.prompts`, but you can
# also pass your own prompt via the `prompt` argument
query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)

# Compute the (cosine) similarity between the query and document embeddings
similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)
# tensor([[0.7646, 0.1414],
#         [0.1355, 0.6000]])

(Note the revision argument)

I also have similar PRs for 4B and 8B. When they're merged, you can just use this without the revision argument.

Tom Aarsen

zrzakhan

5 days ago

•

edited 5 days ago

That’s great! Thank you, Tom!