Qwen
/

GGUF
conversational

Getting token embeddings instead of sentence embeddings

#8
by cicada330117 - opened

Hi all,

i'am trying to load and use this via llama-cpp. i've downloaded the quantized checkpoint and tried running on my system

from llama_cpp import Llama
gguf_embed=Llama(
model_path="./models/embedding_model_qwen3_gguf/Qwen3-Embedding-0.6B-Q8_0.gguf",
embedding=True
)

gembed=gguf_embed.embed("this is just checking")

here,

the len(gembed) i am getting as 4 (the number of words in the tokens and the len(gembed[0]) is 1024 (which is the embedding dimension)

am i missing something? we should get sentence embeddings as output, right?

this is not the case while i am using base model with sentence-transformers

thanks in advance

a) run this model with last_pooling

b) don't use these GGUFs, the tokenizer is broken

Sign up or log in to comment