add use case for vllm & modify Citation
Apologies for not yet responding in the other PR - the Requires sentence-transformers>=2.7.0
seems good to me! Sentence Transformers has supported the various components necessary to run this model for quite a few versions. Perhaps the only thing that didn't exist yet was the model.similarity
method, which was added in 3.0.0
instead.
- Tom Aarsen
Apologies for not yet responding in the other PR - the
Requires sentence-transformers>=2.7.0
seems good to me! Sentence Transformers has supported the various components necessary to run this model for quite a few versions. Perhaps the only thing that didn't exist yet was themodel.similarity
method, which was added in3.0.0
instead.
- Tom Aarsen
model.similarity is not necessary
Agreed. That's why I think 2.7.0 is a totally good minimum version to mention
Apologies for not yet responding in the other PR - the
Requires sentence-transformers>=2.7.0
seems good to me! Sentence Transformers has supported the various components necessary to run this model for quite a few versions. Perhaps the only thing that didn't exist yet was themodel.similarity
method, which was added in3.0.0
instead.
- Tom Aarsen
like this?
from sentence_transformers import SentenceTransformer
model = SentenceTransformer(r'D:\backup\Qwen3-Embedding-0.6B', truncate_dim=128)
# We recommend enabling flash_attention_2 for better acceleration and memory saving,
# together with setting `padding_side` to "left":
# model = SentenceTransformer(
# "Qwen/Qwen3-Embedding-0.6B",
# revision="refs/pr/2",
# model_kwargs={"attn_implementation": "flash_attention_2", "device_map": "auto"},
# tokenizer_kwargs={"padding_side": "left"},
# )
# The queries and documents to embed
queries = [
"What is the capital of China?",
"Explain gravity",
]
documents = [
"The capital of China is Beijing.",
"Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]
# Encode the queries and documents. Note that queries benefit from using a prompt
# Here we use the prompt called "query" stored under `model.prompts`, but you can
# also pass your own prompt via the `prompt` argument
query_embeddings = model.encode(queries, prompt_name="query", prompt="Given a web search query, retrieve relevant passages that answer the query")
document_embeddings = model.encode(documents)
# Compute the (cosine) similarity between the query and document embeddings
similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)
but Two different results were obtained.
①by torch demo with prompt“"Given a web search query, retrieve relevant passages that answer the query"”
# 128 dim
# [[0.7877246737480164, 0.21917583048343658], [0.18308258056640625, 0.6724958419799805]]
②with up code
model = SentenceTransformer(r'D:\backup\Qwen3-Embedding-0.6B', truncate_dim=128)
query_embeddings = model.encode(queries, prompt_name="query", prompt="Given a web search query, retrieve relevant passages that answer the query")
got result:
tensor([[0.7948, 0.2974],
[0.2260, 0.7116]])
You can either provide prompt
or prompt_name
. If you provide both, prompt
has priority.
These two give different results:
query_embeddings = model.encode(queries, prompt="Given a web search query, retrieve relevant passages that answer the query")
# tensor([[0.7948, 0.2974],
# [0.2260, 0.7116]])
query_embeddings = model.encode(queries, prompt_name="query")
# tensor([[0.7877, 0.2192],
# [0.1831, 0.6725]])
because the prompt in the model with the name "query" is not "Given a web search query, retrieve relevant passages that answer the query"
, but it's:
print(repr(model.prompts["query"]))
# 'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:'
So, these two are the same:
query_embeddings = model.encode(queries, prompt="Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:")
# tensor([[0.7877, 0.2192],
# [0.1831, 0.6725]])
query_embeddings = model.encode(queries, prompt_name="query")
# tensor([[0.7877, 0.2192],
# [0.1831, 0.6725]])
I hope that makes sense now!
- Tom Aarsen
You can either provide
prompt
orprompt_name
. If you provide both,prompt
has priority.
These two give different results:
query_embeddings = model.encode(queries, prompt="Given a web search query, retrieve relevant passages that answer the query") # tensor([[0.7948, 0.2974], # [0.2260, 0.7116]])
query_embeddings = model.encode(queries, prompt_name="query") # tensor([[0.7877, 0.2192], # [0.1831, 0.6725]])
because the prompt in the model with the name "query" is not
"Given a web search query, retrieve relevant passages that answer the query"
, but it's:
print(repr(model.prompts["query"])) # 'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:'
So, these two are the same:
query_embeddings = model.encode(queries, prompt="Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:") # tensor([[0.7877, 0.2192], # [0.1831, 0.6725]])
query_embeddings = model.encode(queries, prompt_name="query") # tensor([[0.7877, 0.2192], # [0.1831, 0.6725]])
I hope that makes sense now!
- Tom Aarsen
thks