MongoDB/mdbr-leaf-ir-asym

Content

Introduction
Technical Report
Highlights
Benchmarks
Quickstart
Citation

Introduction

mdbr-leaf-ir-asym is a high-performance text embedding model specifically designed for information retrieval (IR) tasks, e.g., the retrieval stage of Retrieval-Augmented Generation (RAG) pipelines.

This model is the asymmetric variant of mdbr-leaf-ir, which uses MongoDB/mdbr-leaf-ir for queries and Snowflake/snowflake-arctic-embed-m-v1.5 for documents.

The model is robust to vector quantization and MRL truncation.

If you are looking to perform other tasks such as classification, clustering, semantic sentence similarity, summarization, please check out our mdbr-leaf-mt model.

Note: this model has been developed by the ML team of MongoDB Research. At the time of writing it is not used in any of MongoDB's commercial product or service offerings.

Technical Report

A technical report detailing our proposed LEAF training procedure is available here.

Highlights

State-of-the-Art Performance: mdbr-leaf-ir-asym achieves state-of-the-art results for compact embedding models, ranking #1 on the public BEIR benchmark leaderboard for models with ≤100M parameters.
Flexible Architecture Support: mdbr-leaf-ir-asym uses an asymmetric retrieval architecture enabling even greater retrieval results.
MRL and Quantization Support: embedding vectors generated by mdbr-leaf-ir-asym compress well when truncated (MRL) and can be stored using more efficient types like int8 and binary. See below for more information.

Benchmark Comparison

The table below shows the average BEIR benchmark scores (nDCG@10) for mdbr-leaf-ir-asym compared to other retrieval models.

mdbr-leaf-ir ranks #1 on the BEIR public leaderboard, and when run in asymmetric "(asym.)" mode, the results improve even further.

Model	Size	BEIR Avg. (nDCG@10)
OpenAI text-embedding-3-large	Unknown	55.43
mdbr-leaf-ir (asym.)	23M	54.03
mdbr-leaf-ir	23M	53.55
snowflake-arctic-embed-s	32M	51.98
bge-small-en-v1.5	33M	51.65
OpenAI text-embedding-3-small	Unknown	51.08
granite-embedding-small-english-r2	47M	50.87
snowflake-arctic-embed-xs	23M	50.15
e5-small-v2	33M	49.04
SPLADE++	110M	48.88
MiniLM-L6-v2	23M	41.95
BM25	–	41.14

Quickstart

Sentence Transformers

from sentence_transformers import SentenceTransformer  

# Load the model  
model = SentenceTransformer("MongoDB/mdbr-leaf-ir-asym")  

# Example queries and documents
queries = [
    "What is machine learning?", 
    "How does neural network training work?",
]

documents = [
    "Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.",
    "Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors.",
]

# Encode queries and documents
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)

# Compute similarity scores
scores = model.similarity(query_embeddings, document_embeddings)

# Print results
for i, query in enumerate(queries):
    print(f"Query: {query}")
    for j, doc in enumerate(documents):
        print(f" Similarity: {scores[i, j]:.4f} | Document {j}: {doc[:80]}...")

See example output

Query: What is machine learning?
 Similarity: 0.6729 | Document 0: Machine learning is a subset of artificial intelligence that focuses on algorith...
 Similarity: 0.4472 | Document 1: Neural networks are trained through backpropagation, adjusting weights to minimi...

Query: How does neural network training work?
 Similarity: 0.4080 | Document 0: Machine learning is a subset of artificial intelligence that focuses on algorith...
 Similarity: 0.5477 | Document 1: Neural networks are trained through backpropagation, adjusting weights to minimi...

Transformers Usage

See full example notebook here.

Asymmetric Retrieval Setup

mdbr-leaf-ir is aligned to snowflake-arctic-embed-m-v1.5, the model it has been distilled from. This enables flexible architectures in which, for example, documents are encoded using the larger model, while queries can be encoded faster and more efficiently with the compact leaf model. This generally outperforms the symmetric setup in which both queries and documents are encoded with leaf.

To use exclusively the leaf model, use mdbr-leaf-ir.

MRL Truncation

Embeddings have been trained via MRL and can be truncated for more efficient storage:

query_embeds = model.encode_query(queries, truncate_dim=256)
doc_embeds = model.encode_document(documents, truncate_dim=256)

similarities = model.similarity(query_embeds, doc_embeds)

print('After MRL:')
print(f"* Embeddings dimension: {query_embeds.shape[1]}")
print(f"* Similarities:\n{similarities}")

See example output

* Embeddings dimension: 256
* Similarities:
tensor([[0.7027, 0.4943],
        [0.4388, 0.5820]])

Vector Quantization

Vector quantization, for example to int8 or binary, can be performed as follows:

Note: For vector quantization to types other than binary, we suggest performing a calibration to determine the optimal ranges, see here. Good initial values, according to the teacher model's documentation, are:

int8: -0.3 and +0.3
int4: -0.18 and +0.18

from sentence_transformers.quantization import quantize_embeddings
import torch

query_embeds = model.encode_query(queries)
doc_embeds = model.encode_document(documents)

# Quantize embeddings to int8 using -0.3 and +0.3 as calibration ranges
ranges = torch.tensor([[-0.3], [+0.3]]).expand(2, query_embeds.shape[1]).cpu().numpy()
query_embeds = quantize_embeddings(query_embeds, "int8", ranges=ranges)
doc_embeds = quantize_embeddings(doc_embeds, "int8", ranges=ranges)

# Calculate similarities; cast to int64 to avoid under/overflow
similarities = query_embeds.astype(int) @ doc_embeds.astype(int).T

print('After quantization:')
print(f"* Embeddings type: {query_embeds.dtype}")
print(f"* Similarities:\n{similarities}")

See example output

After quantization:
* Embeddings type: int8
* Similarities:
   [[115524  76757]
    [ 69887  94140]]

Evaluation

Please see here.

Citation

If you use this model in your work, please cite:

@misc{mdbr_leaf,
      title={LEAF: Knowledge Distillation of Text Embedding Models with Teacher-Aligned Representations}, 
      author={Robin Vujanic and Thomas Rueckstiess},
      year={2025},
      eprint={2509.12539},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2509.12539}, 
}

License

This model is released under Apache 2.0 License.

Contact

For questions or issues, please open an issue or pull request. You can also contact the MongoDB ML research team at [email protected].

Acknowledgments

This model version was created by @tomaarsen - we thank him for his contribution to this project.

Downloads last month: -; Downloads are not tracked for this model. How to track

Collection including MongoDB/mdbr-leaf-ir-asym

mdbr-leaf-embedding

Collection

A collection of compact, high performance text-embedding models trained using our proposed LEAF framework, see https://arxiv.org/abs/2509.12539 • 4 items • Updated 22 days ago • 3