Gemma Family
Collection
LiteRT models in the Gemma Family
•
11 items
•
Updated
•
2
Main Model Card: google/embeddinggemma-300m
This model card provides a few variants of the EmbeddingGemma model that are ready for deployment on Android and iOS using LiteRT, or on Android via the Google AI Edge RAG Library.
Note that all benchmark stats are from a Samsung S25 Ultra.
Backend | Quantization | Max sequence length | Init time (ms) | Inference time (ms) | Memory (RSS in MB) | Model size (MB) |
---|---|---|---|---|---|---|
GPU |
Mixed Precision* |
256 |
1175 |
64 |
762 |
179 |
GPU |
Mixed Precision* |
512 |
1445 |
119 |
762 |
179 |
GPU |
Mixed Precision* |
1024 |
1545 |
241 |
771 |
183 |
GPU |
Mixed Precision* |
2048 |
1707 |
683 |
786 |
196 |
CPU |
Mixed Precision* |
256 |
17.6 |
66 |
110 |
179 |
CPU |
Mixed Precision* |
512 |
24.9 |
169 |
123 |
179 |
CPU |
Mixed Precision* |
1024 |
35.4 |
549 |
169 |
183 |
CPU |
Mixed Precision* |
2048 |
35.8 |
2455 |
333 |
196 |
*Mixed Precision refers to per-channel quantization with int4 for embeddings, feedforward, and projection layers, and int8 for attention (e4_a8_f4_p4).
Notes: