google/embeddinggemma-300m · Seems to prefer shorter content? Model config issue?

Sep 8

I've just been trying to debug why this model doesn't seem to perform as well as others (eg, bge-base, also 768 dimensions) for retrieval on Cloudflare's public documentation.

The model supports 2048 tokens, but I'm only chunking to 512 or less. I'm still finding that it seems to be penalizing longer chunks and matching shorter ones that are similar length to the query, even though they're far less relevant.

Here's an example notebook:

https://colab.research.google.com/gist/mhart/1d7fb7609d586b28e06d453f5597de24/embeddinggemma-shortness.ipynb

The query is "How do I get started with Queues?" and the most relevant snippet in our corpus is in there (titled "Queues · Getting started") – but it appears way down the list in terms of cosine similarity, with a bunch of much shorter, but far less relevant snippets (not even related to Queues at all) – of which I've included a few.

It seems to me that it's penalizing the most relevant document for being longer and starting to include more information (even though this is highly relevant to getting started with queues).

One thing I'm wondering is whether the padding tokens are driving the similarity higher than the extra content tokens are? ie, it's preferring shorter snippets because they'd be padded, just as the short query would be.

Is this a known issue that even ~500 tokens are too many for it to be accurate? Or is there some adjustment that could be made to the padding or pooling that might help here?

hichaelmart

Sep 8

•

edited Sep 8

From the notebook:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("google/embeddinggemma-300m")

# The first Queues Getting Started document is clearly the most relevant – but it's longer and has actual content about how to get started
# But the other three documents all score higher, even though they've got nothing to do with Queues

# This happens with other documents and queries too

docs = model.encode([
    "title: Queues · Getting started | text: Cloudflare Queues is a flexible messaging queue that allows you to queue messages for asynchronous processing. By following this guide, you will create your first queue, a Worker to publish messages to that queue, and a consumer Worker to consume messages from that queue.\n\n## Prerequisites\n\nTo use Queues, you will need:\n\n1. Sign up for a [Cloudflare account](https://dash.cloudflare.com/sign-up/workers-and-pages).\n2. Install [`Node.js`](https://docs.npmjs.com/downloading-and-installing-node-js-and-npm).\n\n### Node.js version manager\n\nUse a Node version manager like [Volta](https://volta.sh/) or [nvm](https://github.com/nvm-sh/nvm) to avoid permission issues and change Node.js versions. [Wrangler](/workers/wrangler/install-and-update/), discussed later in this guide, requires a Node version of `16.17.0` or later.\n\n## 1. Create a Worker project\n\nYou will access your queue from a Worker, the producer Worker. You must create at least one producer Worker to publish messages onto your queue. If you are using [R2 Bucket Event Notifications](/r2/buckets/event-notifications/), then you do not need a producer Worker.\n\nTo create a producer Worker, run:\n\n```sh\nnpm create cloudflare@latest -- \"producer-worker\n```\n\nThis will create a new directory, which will include both a `src/index.ts` Worker script, and a [`wrangler.jsonc`](/workers/wrangler/configuration/) configuration file. After you create your Worker, you will create a Queue to access.\n\nMove into the newly created directory:\n\n```sh\ncd producer-worker\n```\n\n## 2. Create a queue\n\nTo use queues, you need to create at least one queue to publish messages to and consume messages from.\n\nTo create a queue, run:\n\n```sh\nnpx wrangler queues create \u003CMY-QUEUE-NAME\u003E\n```\n",
    "title: Web3 · Ethereum Gateway · Concepts | text: As you get started with Cloudflare's Ethereum Gateway, you may want to read through the following concepts.\n\n:::note\n\nFor help with additional concepts, refer to the [Ethereum documentation](https://ethereum.org/).\n:::",
    "title: Load Balancing · Get started | text: Get started with load balancing in one of two ways:\n\n* [Quickstart](/load-balancing/get-started/quickstart/): Get up and running quickly with Load Balancing.\n* [Learning path](/learning-paths/load-balancing/concepts/): Check an in-depth walkthrough for how to plan and set up a load balancer.",
    "title: Workers · Tutorials · Build a Slackbot | text: If you want to get started building your own projects, review the existing list of [Quickstart templates](/workers/get-started/quickstarts/).",
])

query = "How do I get started with Queues?"

print(model.similarity(docs, model.encode([f"task: search result | query: {query}"])))

print(model.similarity(docs, model.encode([f"task: question answering | query: {query}"])))

# tensor([[0.4166],
#         [0.4299],
#         [0.4668],
#         [0.4799]])
# tensor([[0.3731],
#         [0.4365],
#         [0.4623],
#         [0.4645]])

hichaelmart

Sep 9

@BalakrishnaCh do you have any ideas or suggestions here?

electroglyph

Sep 9

weird. it's not padding that's doing it tho. i'm testing the model without sentence transformers so that i can tokenize without padding, and the results are still the same.

electroglyph

Sep 9

this model seems broken as-is. here's MTEB NanoMSMARCORetrieval scores of this model vs Snowflake/snowflake-arctic-embed-m-v2.0:

https://pastebin.com/2Qd1dJPa

hichaelmart

Sep 9

Oh awesome, thanks for running that. Seems something's not quite right.

Others are seeing issues too:

https://www.reddit.com/r/LocalLLaMA/comments/1ncfk97/googleembeddinggemma300m_is_broken/
https://www.reddit.com/r/LocalLLaMA/comments/1n8egxb/comment/nceexuz/
https://www.reddit.com/r/LocalLLaMA/comments/1n8egxb/comment/ncf8k9i/

timbmg

Sep 9

@tomaarsen , could you double-check if the problem is with sbert?

hichaelmart

Sep 10

I think something's up with the model weights or config.

If I use a different model source:

!pip install sentence-transformers[onnx]

model = SentenceTransformer("onnx-community/embeddinggemma-300m-ONNX", backend="onnx",
                            model_kwargs={
                                "provider": "CPUExecutionProvider",
                                "file_name": "onnx/model.onnx",
                                })

Then I get much better accuracy:

# The first result is a much better match than the others now

tensor([[0.7392],
        [0.4998],
        [0.6326],
        [0.5803]])
tensor([[0.7187],
        [0.5220],
        [0.6596],
        [0.5953]])

You can try that model out in your browser here too (the demo uses dot product instead of cosine, but same results):
https://huggingface.co/spaces/webml-community/semantic-galaxy

If I print(model) for the ONNX model, I get:

SentenceTransformer(
  (0): Transformer({'max_seq_length': 2048, 'do_lower_case': False, 'architecture': 'ORTModelForFeatureExtraction'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

For the official google/embeddinggemma-300m model I get:

SentenceTransformer(
  (0): Transformer({'max_seq_length': 2048, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (4): Normalize()
)

hichaelmart changed discussion title from Seems to prefer shorter content? Issues with padding? to Seems to prefer shorter content? Model config issue? Sep 10

Taeko1224

Sep 10

I think I might have encountered a similar problem as well. When I applied this model to my own retrieval task, the model did not work as well as expected.
Also, I noticed that when using the example code from the model card, the output is not the same as the comments in the example. I wonder if this is normal?

example's output: tensor([[0.3011, 0.6359, 0.4930, 0.4889]])
my output: tensor([[0.4989, 0.7087, 0.5910, 0.5932]])

electroglyph

Sep 10

•

edited Sep 10

hey, good catch! here are results for "onnx-community/embeddinggemma-300m-ONNX" on MTEB "AppsRetrieval" and "NanoMSMARCORetrieval" : https://pastebin.com/77qwWth4

AppsRetrieval is in the ballbark of what i'd expect for not using the query format, NanoMSMARCORetrieval is improved but still seems low? i probably need to make a custom model wrapper to benchmark it a bit better, but things are looking up!

interesting comparison between the model configs too

krsaulitis

Sep 10

I think I might have encountered a similar problem as well. When I applied this model to my own retrieval task, the model did not work as well as expected.
Also, I noticed that when using the example code from the model card, the output is not the same as the comments in the example. I wonder if this is normal?

example's output: tensor([[0.3011, 0.6359, 0.4930, 0.4889]])

my output: tensor([[0.4989, 0.7087, 0.5910, 0.5932]])

Same here, the results when using SentenceTransformer implementation (no quant) is pretty bad.

tomaarsen

Sep 10

I've not been able to double-check this with confidence, but are you all on the required transformers version? Without it, the model will use causal attention instead of bidirectional attention:

pip install git+https://github.com/huggingface/[email protected]
pip install sentence-transformers>=5.0.0

Source: https://huggingface.co/blog/embeddinggemma#sentence-transformers

With this correct version, I get:

tensor([[0.3008, 0.6361, 0.4927, 0.4889]])

Tom Aarsen

Taeko1224

Sep 10

I've not been able to double-check this with confidence, but are you all on the required transformers version? Without it, the model will use causal attention instead of bidirectional attention:
pip install git+https://github.com/huggingface/[email protected]
pip install sentence-transformers>=5.0.0
Source: https://huggingface.co/blog/embeddinggemma#sentence-transformers

With this correct version, I get:
tensor([[0.3008, 0.6361, 0.4927, 0.4889]])
Tom Aarsen

oh goodness! It seems the problem is hidden in the specific version of transformers. After switching the version of transformers from the latest version to v4.56.0-Embedding-Gemma-preview, everything got back to normal.
Thank you very much!

Taeko1224

Sep 10

I think I might have encountered a similar problem as well. When I applied this model to my own retrieval task, the model did not work as well as expected.
Also, I noticed that when using the example code from the model card, the output is not the same as the comments in the example. I wonder if this is normal?

example's output: tensor([[0.3011, 0.6359, 0.4930, 0.4889]])

my output: tensor([[0.4989, 0.7087, 0.5910, 0.5932]])

Same here, the results when using SentenceTransformer implementation (no quant) is pretty bad.

Use transformers with version v4.56.0-Embedding-Gemma-preview and everything will go well 😊

electroglyph

Sep 11

lol, i thought > 4.56 had it in there, but it's not coming until 4.57

BalakrishnaCh

Google org Sep 11

Hi @hichaelmart ,

Could you please try to run with latest transformers and sentence_transformers versions. I have tried with latest versions of these libraries and can able to get the good results.

The working versions of transformers and sentence_transformers :

Please find the attached gist file for your reference.

Thanks.

hichaelmart

Sep 11

I was following the instructions from here: https://huggingface.co/google/embeddinggemma-300m#usage

That just says to use pip install -U sentence-transformers

The versions you're using seem to be very specific versions that aren't mentioned in the model page (or official documentation?) Are you sure they're the official versions of these libraries? Just seems strange it's called @v4.56.0-Embedding-Gemma-preview

In any case, glad this is solved – but seems this should be mentioned in more places officially.

BalakrishnaCh

Google org Sep 11

Hi @hichaelmart ,

Yes, this is the specific dev version of sentence-transformers , I have already forwarded the same concern to respective team to look into. Thank you so much for your patience and understanding.

Thanks.

hichaelmart

Sep 11

I also originally forked a notebook called "embeddinggemma-300m.ipynb" from here: https://huggingface.co/google/embeddinggemma-300m.ipynb

That didn't used to have these specific version instructions. It seems that's now been updated with these instructions (and a lot more).

Glad to see it!

CoralLeiCN

Sep 11

I've not been able to double-check this with confidence, but are you all on the required transformers version? Without it, the model will use causal attention instead of bidirectional attention:
pip install git+https://github.com/huggingface/[email protected]
pip install sentence-transformers>=5.0.0
Source: https://huggingface.co/blog/embeddinggemma#sentence-transformers

With this correct version, I get:
tensor([[0.3008, 0.6361, 0.4927, 0.4889]])
Tom Aarsen
oh goodness! It seems the problem is hidden in the specific version of transformers. After switching the version of transformers from the latest version to v4.56.0-Embedding-Gemma-preview, everything got back to normal.
Thank you very much!

Thank you very much!
On MPS device, I get
tensor([[0.3008, 0.6361, 0.4927, 0.4889]])

hichaelmart

29 days ago

Hi @hichaelmart ,

Yes, this is the specific dev version of sentence-transformers , I have already forwarded the same concern to respective team to look into. Thank you so much for your patience and understanding.

Thanks.

Hey @BalakrishnaCh – can you ping them again? The official instructions still haven't been updated:

https://huggingface.co/google/embeddinggemma-300m#usage

BalakrishnaCh

Google org 29 days ago

Hi @BalakrishnaCh ,

I have forwarded your concern to the team, they are currently looking into it. Your patience is really appreciated in this matter.

Thanks.

hichaelmart

24 days ago

@BalakrishnaCh any update here?

I created a pull request in case there's any confusion about what needs to be changed in the README: https://huggingface.co/google/embeddinggemma-300m/discussions/24

BalakrishnaCh

Google org 22 days ago

Thanks for your contribution by raising the PR for the changes. The team will review the changes and will merge once it's reviewed.