Getting token embeddings instead of sentence embeddings
#8
by
cicada330117
- opened
Hi all,
i'am trying to load and use this via llama-cpp. i've downloaded the quantized checkpoint and tried running on my system
from llama_cpp import Llama
gguf_embed=Llama(
model_path="./models/embedding_model_qwen3_gguf/Qwen3-Embedding-0.6B-Q8_0.gguf",
embedding=True
)
gembed=gguf_embed.embed("this is just checking")
here,
the len(gembed)
i am getting as 4 (the number of words in the tokens and the len(gembed[0])
is 1024 (which is the embedding dimension)
am i missing something? we should get sentence embeddings as output, right?
this is not the case while i am using base model with sentence-transformers
thanks in advance
a) run this model with last_pooling
b) don't use these GGUFs, the tokenizer is broken