CSR-Embedding
Collection
This is a collection of csr-embeddings.
•
7 items
•
Updated
•
3
For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our Github.
📌 Tip: For NV-Embed-V2, using Transformers versions later than 4.47.0 may lead to performance degradation, as model_type=bidir_mistral
in config.json
is no longer supported.
We recommend using Transformers 4.47.0.
You can evaluate this model loaded by Sentence Transformers with the following code snippet:
import mteb
from sentence_transformers import SparseEncoder
model = SparseEncoder("Y-Research-Group/CSR-NV_Embed_v2-Retrieval-NFcorpus", trust_remote_code=True)
model.prompts = {
"NFCorpus-query": "Instruct: Given a question, retrieve relevant documents that answer the question\nQuery:"
}
task = mteb.get_tasks(tasks=["NFCorpus"])
evaluation = mteb.MTEB(tasks=task)
evaluation.run(
model,
eval_splits=["test"],
output_folder="./results/NFCorpus",
show_progress_bar=True,
encode_kwargs={"convert_to_sparse_tensor": False, "batch_size": 8},
) # MTEB don't support sparse tensors yet, so we need to convert to dense tensors
@misc{wen2025matryoshkarevisitingsparsecoding,
title={Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation},
author={Tiansheng Wen and Yifei Wang and Zequn Zeng and Zhong Peng and Yudi Su and Xinyang Liu and Bo Chen and Hongwei Liu and Stefanie Jegelka and Chenyu You},
year={2025},
eprint={2503.01776},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2503.01776},
}
Base model
nvidia/NV-Embed-v2