metadata
license: mit
datasets:
- mteb/scifact
language:
- en
pipeline_tag: text-retrieval
library_name: sentence-transformers
tags:
- mteb
- text
- transformers
- text-embeddings-inference
- CSR
model-index:
- name: CSR
results:
- dataset:
name: MTEB SciFact
type: mteb/scifact
revision: 0228b52cf27578f30900b9e5271d331663a030d7
config: default
split: test
languages:
- eng-Latn
metrics:
- type: ndcg@1
value: 0.59333
- type: ndcg@3
value: 0.65703
- type: ndcg@5
value: 0.67072
- type: ndcg@10
value: 0.68412
- type: ndcg@20
value: 0.69238
- type: ndcg@100
value: 0.70514
- type: ndcg@1000
value: 0.71517
- type: map@1
value: 0.5675
- type: map@3
value: 0.63602
- type: map@5
value: 0.64712
- type: map@10
value: 0.65301
- type: map@20
value: 0.65552
- type: map@100
value: 0.65778
- type: map@1000
value: 0.65815
- type: recall@1
value: 0.5675
- type: recall@3
value: 0.69772
- type: recall@5
value: 0.73367
- type: recall@10
value: 0.77333
- type: recall@20
value: 0.80367
- type: recall@100
value: 0.86667
- type: recall@1000
value: 0.945
- type: precision@1
value: 0.59333
- type: precision@3
value: 0.25667
- type: precision@5
value: 0.164
- type: precision@10
value: 0.08667
- type: precision@20
value: 0.04533
- type: precision@100
value: 0.0099
- type: precision@1000
value: 0.00107
- type: mrr@1
value: 0.59333
- type: mrr@3
value: 0.64667
- type: mrr@5
value: 0.65333
- type: mrr@10
value: 0.65883
- type: mrr@20
value: 0.66105
- type: mrr@100
value: 0.66254
- type: mrr@1000
value: 0.66292
- type: main_score
value: 0.68412
task:
type: Retrieval
For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our Github.
Usage
📌 Tip: For NV-Embed-V2, using Transformers versions later than 4.47.0 may lead to performance degradation, as model_type=bidir_mistral
in config.json
is no longer supported.
We recommend using Transformers 4.47.0.
Sentence Transformers Usage
You can evaluate this model loaded by Sentence Transformers with the following code snippet:
import mteb
from sentence_transformers import SparseEncoder
model = SparseEncoder(
"Y-Research-Group/CSR-NV_Embed_v2-Retrieval-SciFACT ",
trust_remote_code=True
)
model.prompts = {
"SciFact-query": "Instrcut: Given a scientific claim, retrieve documents that support or refute the claim\nQuery:"
}
task = mteb.get_tasks(tasks=["SciFact"])
evaluation = mteb.MTEB(tasks=task)
evaluation.run(
model,
eval_splits=["test"],
output_folder="./results/SciFact",
show_progress_bar=True
encode_kwargs={"convert_to_sparse_tensor": False, "batch_size": 8},
) # MTEB don't support sparse tensors yet, so we need to convert to dense tensors
Citation
@inproceedings{wenbeyond,
title={Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation},
author={Wen, Tiansheng and Wang, Yifei and Zeng, Zequn and Peng, Zhong and Su, Yudi and Liu, Xinyang and Chen, Bo and Liu, Hongwei and Jegelka, Stefanie and You, Chenyu},
booktitle={Forty-second International Conference on Machine Learning}
}