File size: 7,658 Bytes
c342f94 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
---
license: apache-2.0
base_model: microsoft/MiniLM-L6-v2
tags:
- transformers
- sentence-transformers
- sentence-similarity
- feature-extraction
- text-embeddings-inference
- information-retrieval
- knowledge-distillation
language:
- en
---
<div style="display: flex; justify-content: center;">
<div style="display: flex; align-items: center; gap: 10px;">
<img src="logo.webp" alt="MongoDB Logo" style="height: 36px; width: auto; border-radius: 4px;">
<span style="font-size: 32px; font-weight: bold">MongoDB/mdbr-leaf-mt</span>
</div>
</div>
# Introduction
`mdbr-leaf-mt` is a compact high-performance text embedding model designed for classification, clustering, semantic sentence similarity and summarization tasks.
To enable even greater efficiency, `mdbr-leaf-mt` supports [flexible asymmetric architectures](#asymmetric-retrieval-setup) and is robust to [vector quantization](#vector-quantization) and [MRL truncation](#mrl).
If you are looking to perform semantic search / information retrieval (e.g. for RAGs), please check out our [`mdbr-leaf-ir`](https://huggingface.co/MongoDB/mdbr-leaf-ir) model, which is specifically trained for these tasks.
> [!Note]
> **Note**: this model has been developed by the ML team of MongoDB Research. At the time of writing it is not used in any of MongoDB's commercial product or service offerings.
# Technical Report
A technical report detailing our proposed `LEAF` training procedure is [available here (TBD)](http://FILL_HERE_ARXIV_LINK).
# Highlights
* **State-of-the-Art Performance**: `mdbr-leaf-mt` achieves new state-of-the-art results for compact embedding models, ranking <span style="color:red">#TBD</span> on the [public MTEB v2 (Eng) benchmark leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for models <30M parameters with an average score of <span style="color:red">[TBD HERE]</span>.
* **Flexible Architecture Support**: `mdbr-leaf-mt` supports asymmetric retrieval architectures enabling even greater retrieval results. [See below](#asymmetric-retrieval-setup) for more information.
* **MRL and Quantization Support**: embedding vectors generated by `mdbr-leaf-mt` compress well when truncated (MRL) and/or can be stored using more efficient types like `int8` and `binary`. [See below](#mrl) for more information.
# Quickstart
## Sentence Transformers
```python
from sentence_transformers import SentenceTransformer
# Load the model
model = SentenceTransformer("MongoDB/mdbr-leaf-mt")
# Example queries and documents
queries = [
"What is machine learning?",
"How does neural network training work?"
]
documents = [
"Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.",
"Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors."
]
# Encode queries and documents
query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)
# Compute similarity scores
scores = model.similarity(query_embeddings, document_embeddings)
# Print results
for i, query in enumerate(queries):
print(f"Query: {query}")
for j, doc in enumerate(documents):
print(f" Similarity: {scores[i, j]:.4f} | Document {j}: {doc[:80]}...")
# Query: What is machine learning?
# Similarity: 0.9063 | Document 0: Machine learning is a subset of ...
# Similarity: 0.7287 | Document 1: Neural networks are trained ...
#
# Query: How does neural network training work?
# Similarity: 0.6725 | Document 0: Machine learning is a subset of ...
# Similarity: 0.8287 | Document 1: Neural networks are trained ...
```
## Transformers Usage
See [here](https://huggingface.co/MongoDB/mdbr-leaf-mt/blob/main/transformers_example_mt.ipynb).
## Asymmetric Retrieval Setup
`mdbr-leaf-mt` is *aligned* to [`mxbai-embed-large-v1`](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1), the model it has been distilled from, making the asymmetric system below possible:
```python
# Use mdbr-leaf-mt for query encoding (real-time, low latency)
query_model = SentenceTransformer("MongoDB/mdbr-leaf-mt")
query_embeddings = query_model.encode(queries, prompt_name="query")
# Use a larger model for document encoding (one-time, at index time)
doc_model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")
document_embeddings = doc_model.encode(documents)
# Compute similarities
scores = query_model.similarity(query_embeddings, document_embeddings)
```
Retrieval results from asymmetric mode are usually superior to the [standard mode above](#sentence-transformers).
## MRL Truncation
Embeddings have been trained via [MRL](https://arxiv.org/abs/2205.13147) and can be truncated for more efficient storage:
```python
from torch.nn import functional as F
query_embeds = model.encode(queries, prompt_name="query", convert_to_tensor=True)
doc_embeds = model.encode(documents, convert_to_tensor=True)
# Truncate and normalize according to MRL
query_embeds = F.normalize(query_embeds[:, :256], dim=-1)
doc_embeds = F.normalize(doc_embeds[:, :256], dim=-1)
similarities = model.similarity(query_embeds, doc_embeds)
print('After MRL:')
print(f"* Embeddings dimension: {query_embeds.shape[1]}")
print(f"* Similarities:\n\t{similarities}")
# After MRL:
# * Embeddings dimension: 256
# * Similarities:
# tensor([[0.9164, 0.7219],
# [0.6682, 0.8393]], device='cuda:0')
```
## Vector Quantization
Vector quantization, for example to `int8` or `binary`, can be performed as follows:
**Note**: For vector quantization to types other than binary, we suggest performing a calibration to determine the optimal ranges, [see here](https://sbert.net/examples/sentence_transformer/applications/embedding-quantization/README.html#scalar-int8-quantization).
Good initial values are -1.0 and +1.0.
```python
from sentence_transformers.quantization import quantize_embeddings
import torch
query_embeds = model.encode(queries, prompt_name="query")
doc_embeds = model.encode(documents)
# Quantize embeddings to int8 using -1.0 and +1.0
ranges = torch.tensor([[-1.0], [+1.0]]).expand(2, query_embeds.shape[1]).cpu().numpy()
query_embeds = quantize_embeddings(query_embeds, "int8", ranges=ranges)
doc_embeds = quantize_embeddings(doc_embeds, "int8", ranges=ranges)
# Calculate similarities; cast to int64 to avoid under/overflow
similarities = query_embeds.astype(int) @ doc_embeds.astype(int).T
print('After quantization:')
print(f"* Embeddings type: {query_embeds.dtype}")
print(f"* Similarities:\n{similarities}")
# After quantization:
# * Embeddings type: int8
# * Similarities:
# [[2202032 1422868]
# [1421197 1845580]]
```
# Evaluation
The checkpoint used to produce the scores presented in the paper [is here](https://huggingface.co/MongoDB/mdbr-leaf-mt/commit/ea98995e96beac21b820aa8ad9afaa6fd29b243d).
# Citation
If you use this model in your work, please cite:
```bibtex
@article{mdb_leaf,
title = {LEAF: Lightweight Embedding Alignment Knowledge Distillation Framework},
author = {Robin Vujanic and Thomas Rueckstiess},
year = {2025}
eprint = {TBD},
archiveprefix = {arXiv},
primaryclass = {FILL HERE},
url = {FILL HERE}
}
```
# License
This model is released under Apache 2.0 License.
# Contact
For questions or issues, please open an issue or pull request. You can also contact the MongoDB ML Research team at [email protected]. |