add use case for vllm & modify Citation

by zyznull - opened 3 days ago

base: refs/heads/main

←

from: refs/pr/5

Discussion Files changed

+41

-6

zyznull

Qwen org 3 days ago

No description provided.

add use case for vllm & modify Citationf7180363

tomaarsen

3 days ago

Apologies for not yet responding in the other PR - the Requires sentence-transformers>=2.7.0 seems good to me! Sentence Transformers has supported the various components necessary to run this model for quite a few versions. Perhaps the only thing that didn't exist yet was the model.similarity method, which was added in 3.0.0 instead.

Tom Aarsen

chaochaoli

3 days ago

Apologies for not yet responding in the other PR - the Requires sentence-transformers>=2.7.0 seems good to me! Sentence Transformers has supported the various components necessary to run this model for quite a few versions. Perhaps the only thing that didn't exist yet was the model.similarity method, which was added in 3.0.0 instead.

Tom Aarsen

model.similarity is not necessary

tomaarsen

3 days ago

Agreed. That's why I think 2.7.0 is a totally good minimum version to mention

chaochaoli

3 days ago

Apologies for not yet responding in the other PR - the Requires sentence-transformers>=2.7.0 seems good to me! Sentence Transformers has supported the various components necessary to run this model for quite a few versions. Perhaps the only thing that didn't exist yet was the model.similarity method, which was added in 3.0.0 instead.

Tom Aarsen

like this？

from sentence_transformers import SentenceTransformer
model = SentenceTransformer(r'D:\backup\Qwen3-Embedding-0.6B', truncate_dim=128)

# We recommend enabling flash_attention_2 for better acceleration and memory saving,
# together with setting `padding_side` to "left":
# model = SentenceTransformer(
#     "Qwen/Qwen3-Embedding-0.6B",
#     revision="refs/pr/2",
#     model_kwargs={"attn_implementation": "flash_attention_2", "device_map": "auto"},
#     tokenizer_kwargs={"padding_side": "left"},
# )

# The queries and documents to embed
queries = [
    "What is the capital of China?",
    "Explain gravity",
]
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]

# Encode the queries and documents. Note that queries benefit from using a prompt
# Here we use the prompt called "query" stored under `model.prompts`, but you can
# also pass your own prompt via the `prompt` argument
query_embeddings = model.encode(queries, prompt_name="query", prompt="Given a web search query, retrieve relevant passages that answer the query")
document_embeddings = model.encode(documents)

# Compute the (cosine) similarity between the query and document embeddings
similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)

but Two different results were obtained.

①by torch demo with prompt“"Given a web search query, retrieve relevant passages that answer the query"”

# 128 dim
# [[0.7877246737480164, 0.21917583048343658], [0.18308258056640625, 0.6724958419799805]]

②with up code

model = SentenceTransformer(r'D:\backup\Qwen3-Embedding-0.6B', truncate_dim=128)
query_embeddings = model.encode(queries, prompt_name="query", prompt="Given a web search query, retrieve relevant passages that answer the query")

got result:

tensor([[0.7948, 0.2974],
        [0.2260, 0.7116]])

tomaarsen

3 days ago

You can either provide prompt or prompt_name. If you provide both, prompt has priority.
These two give different results:

query_embeddings = model.encode(queries, prompt="Given a web search query, retrieve relevant passages that answer the query")
# tensor([[0.7948, 0.2974],
#         [0.2260, 0.7116]])

query_embeddings = model.encode(queries, prompt_name="query")
# tensor([[0.7877, 0.2192],
#         [0.1831, 0.6725]])

because the prompt in the model with the name "query" is not "Given a web search query, retrieve relevant passages that answer the query", but it's:

print(repr(model.prompts["query"]))
# 'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:'

So, these two are the same:

query_embeddings = model.encode(queries, prompt="Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:")
# tensor([[0.7877, 0.2192],
#         [0.1831, 0.6725]])

query_embeddings = model.encode(queries, prompt_name="query")
# tensor([[0.7877, 0.2192],
#         [0.1831, 0.6725]])

I hope that makes sense now!

Tom Aarsen

Update README.md6d728645

littlebird13 changed pull request status to merged 2 days ago

chaochaoli

2 days ago

You can either provide prompt or prompt_name. If you provide both, prompt has priority.
These two give different results:

query_embeddings = model.encode(queries, prompt="Given a web search query, retrieve relevant passages that answer the query")
# tensor([[0.7948, 0.2974],
#         [0.2260, 0.7116]])

query_embeddings = model.encode(queries, prompt_name="query")
# tensor([[0.7877, 0.2192],
#         [0.1831, 0.6725]])

because the prompt in the model with the name "query" is not "Given a web search query, retrieve relevant passages that answer the query", but it's:

print(repr(model.prompts["query"]))
# 'Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:'

So, these two are the same:

query_embeddings = model.encode(queries, prompt="Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:")
# tensor([[0.7877, 0.2192],
#         [0.1831, 0.6725]])

query_embeddings = model.encode(queries, prompt_name="query")
# tensor([[0.7877, 0.2192],
#         [0.1831, 0.6725]])

I hope that makes sense now!

Tom Aarsen

thks

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment