DMRetriever: A Family of Models for Improved Text Retrieval in Disaster Management
This repository provides an overview of DMRetriever, a family of embedding and retrieval models designed for disaster-management retrieval tasks.
For details, please refer to the paper and the GitHub repository.
DMRetriever includes model variants with 33M, 109M, 335M, 596M, 4B, and 7.6B parameters.
These models are trained via a three-stage learning framework consisting of:
- Bidirectional Attention Adaptation
- Unsupervised Contrastive Pre-training
- Difficulty-aware Progressive Instruction Fine-tuning
All stages leverage high-quality data generated through an advanced data-refinement pipeline.
DMRetriever achieves state-of-the-art (SOTA) performance across six retrieval intents at all model scales.
π Dataset
Training data are publicly available on DMRetriever_MTT.
π§ͺ Evaluation
Performance across six retrieval intents on the DisastIR-Test benchmark. The evaluation is conducted using this code.
π§© Small Size (β€109M)
| Model | Scale | QA | QAdoc | TW | FC | NLI | STS | Avg. |
|---|---|---|---|---|---|---|---|---|
| thenlper-gte-small | 33M | 18.04 | 9.13 | 10.95 | 49.63 | 37.51 | 55.55 | 30.14 |
| arctic-embed-m | 109M | 33.15 | 14.04 | 8.48 | 35.07 | 38.67 | 56.20 | 30.94 |
| thenlper-gte-base | 109M | 9.18 | 5.42 | 37.91 | 60.45 | 42.52 | 46.07 | 33.59 |
| arctic-embed-m-v1.5 | 109M | 25.76 | 30.41 | 17.95 | 47.97 | 42.88 | 64.16 | 38.19 |
| arctic-embed-s | 33M | 38.58 | 28.81 | 21.33 | 47.21 | 39.85 | 66.96 | 40.46 |
| bge-small-en-v1.5 | 33M | 56.91 | 51.19 | 25.15 | 55.17 | 32.87 | 64.54 | 47.64 |
| bge-base-en-v1.5 | 109M | 51.50 | 52.78 | 46.72 | 59.93 | 41.16 | 68.63 | 53.45 |
| DMRetriever-33M (ours) | 33M | 62.47β | 57.03β | 57.22β | 60.81β | 46.56β | 67.57 | 58.61β |
| DMRetriever-109M (ours) | 109M | 63.19β | 59.55β | 58.88β | 62.48β | 46.93β | 68.79β | 59.97β |
βοΈ Medium Size (137Mβ335M)
| Model | Scale | QA | QAdoc | TW | FC | NLI | STS | Avg. |
|---|---|---|---|---|---|---|---|---|
| arctic-embed-m-long | 137M | 21.51 | 10.86 | 19.24 | 36.13 | 41.67 | 54.94 | 30.73 |
| arctic-embed-l | 335M | 40.56 | 30.19 | 14.98 | 32.64 | 34.20 | 56.10 | 34.78 |
| bge-large-en-v1.5 | 335M | 56.76 | 54.45 | 32.20 | 54.90 | 35.11 | 64.47 | 49.65 |
| gte-base-en-v1.5 | 137M | 60.51 | 55.62 | 46.26 | 52.24 | 39.59 | 70.40 | 54.10 |
| mxbai-embed-large-v1 | 335M | 64.24 | 62.63 | 39.94 | 58.12 | 40.18 | 68.01 | 55.52 |
| arctic-embed-m-v2.0 | 305M | 61.22 | 62.20 | 47.01 | 57.79 | 42.29 | 64.51 | 55.84 |
| DMRetriever-335M (ours) | 335M | 67.44β | 62.69β | 62.16β | 64.42β | 49.69β | 70.71β | 62.85β |
π Large Size (434Mβ1.5B)
| Model | Scale | QA | QAdoc | TW | FC | NLI | STS | Avg. |
|---|---|---|---|---|---|---|---|---|
| arctic-embed-l-v2.0 | 568M | 55.23 | 59.11 | 38.11 | 60.10 | 41.07 | 62.61 | 52.70 |
| gte-large-en-v1.5 | 434M | 67.37 | 58.18 | 39.43 | 52.66 | 34.45 | 66.47 | 53.09 |
| Qwen3-Embedding-0.6B | 596M | 66.10 | 52.31 | 62.38 | 64.89 | 50.30 | 67.39 | 60.56 |
| mulling-e5-large-instruct | 560M | 67.97 | 64.64 | 62.25 | 66.78 | 48.51 | 63.42 | 62.26 |
| mulling-e5-large | 560M | 66.99 | 64.01 | 62.81 | 59.87 | 50.93 | 74.12 | 63.12 |
| gte-Qwen2-1.5B-instruct | 1.5B | 69.85 | 59.17 | 65.09 | 62.73 | 55.51 | 73.58 | 64.32 |
| inf-retriever-v1-1.5b | 1.5B | 69.41 | 64.29 | 62.99 | 65.39 | 54.03 | 73.92 | 65.01 |
| DMRetriever-596M (ours) | 596M | 72.44β | 67.50β | 65.79β | 69.15β | 55.71β | 74.73β | 67.55β |
π§ XL Size (β₯4B)
| Model | Scale | QA | QAdoc | TW | FC | NLI | STS | Avg. |
|---|---|---|---|---|---|---|---|---|
| Qwen3-Embedding-8B | 7.6B | 44.21 | 34.38 | 41.56 | 42.04 | 32.53 | 42.95 | 39.61 |
| gte-Qwen2-7B-instruct | 7.6B | 70.24 | 47.41 | 63.08 | 31.62 | 53.71 | 74.88 | 56.82 |
| NV-Embed-v1 | 7.9B | 68.06 | 62.70 | 56.02 | 59.64 | 48.05 | 67.06 | 60.26 |
| Qwen3-Embedding-4B | 4B | 67.20 | 59.14 | 65.28 | 67.16 | 53.61 | 58.51 | 61.82 |
| e5-mistral-7b-instruct | 7.1B | 65.57 | 64.97 | 63.31 | 67.86 | 47.55 | 66.48 | 62.58 |
| NV-Embed-v2 | 7.9B | 74.47 | 69.37 | 42.40 | 68.32 | 58.20 | 76.07 | 64.80 |
| inf-retriever-v1 | 7.1B | 72.84 | 66.74 | 66.23 | 65.53 | 51.86 | 75.98 | 66.53 |
| SFR-Embedding-Mistral | 7.1B | 71.41 | 67.14 | 69.45 | 70.31 | 50.93 | 72.67 | 66.99 |
| Linq-Embed-Mistral | 7.1B | 74.40 | 70.31 | 64.11 | 70.64 | 52.46 | 71.25 | 67.19 |
| DMRetriever-4B (ours) | 4B | 75.32β | 70.23β | 70.55β | 71.44β | 57.63 | 77.38β | 70.42β |
| DMRetriever-7.6B (ours) | 7.6B | 76.19β | 71.27β | 71.11β | 72.47β | 58.81β | 78.36β | 71.37β |
π¦ DMRetriever Series Model List
| Model | Description | Backbone | Backbone Type | Hidden Size | #Layers |
|---|---|---|---|---|---|
| DMRetriever-33M | Base 33M variant | MiniLM | Encoder-only | 384 | 12 |
| DMRetriever-33M-PT | Pre-trained version of 33M | MiniLM | Encoder-only | 384 | 12 |
| DMRetriever-109M | Base 109M variant | BERT-base-uncased | Encoder-only | 768 | 12 |
| DMRetriever-109M-PT | Pre-trained version of 109M | BERT-base-uncased | Encoder-only | 768 | 12 |
| DMRetriever-335M | Base 335M variant | BERT-large-uncased-WWM | Encoder-only | 1024 | 24 |
| DMRetriever-335M-PT | Pre-trained version of 335M | BERT-large-uncased-WWM | Encoder-only | 1024 | 24 |
| DMRetriever-596M | Base 596M variant | Qwen3-0.6B | Decoder-only | 1024 | 28 |
| DMRetriever-596M-PT | Pre-trained version of 596M | Qwen3-0.6B | Decoder-only | 1024 | 28 |
| DMRetriever-4B | Base 4B variant | Qwen3-4B | Decoder-only | 2560 | 36 |
| DMRetriever-4B-PT | Pre-trained version of 4B | Qwen3-4B | Decoder-only | 2560 | 36 |
| DMRetriever-7.6B | Base 7.6B variant | Qwen3-8B | Decoder-only | 4096 | 36 |
| DMRetriever-7.6B-PT | Pre-trained version of 7.6B | Qwen3-8B | Decoder-only | 4096 | 36 |
π Usage
Please refer to each modelβs Hugging Face page for specific usage instructions, including input format, embedding extraction, and evaluation examples.
π§Ύ Citation
If you find this repository helpful, please consider citing the corresponding paper:
@article{yin2025dmretriever,
title={DMRetriever: A Family of Models for Improved Text Retrieval in Disaster Management},
author={Yin, Kai and Dong, Xiangjue and Liu, Chengkai and Lin, Allen and Shi, Lingfeng and Mostafavi, Ali and Caverlee, James},
journal={arXiv preprint arXiv:2510.15087},
year={2025}
}