MMTEB: Massive Multilingual Text Embedding Benchmark
Paper
•
2502.13595
•
Published
•
15
Normalizing once after truncation at the very end should be sufficient. You don't have to normalize beforehand, although it doesn't hurt.
Good question! Embeddings should be (re-)normalized after the Matryoshka truncation. If you only normalize before truncating, the truncated section won't exactly have the expected mean and standard error, but one that's very slightly off.