--- license: mit datasets: - mteb/mtop_intent language: - en pipeline_tag: text-classification library_name: sentence-transformers tags: - mteb - text - transformers - text-embeddings-inference - sparse-encoder - sparse - csr model-index: - name: CSR results: - dataset: name: MTEB MTOPIntentClassification (en) type: mteb/mtop_intent revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba config: en split: test languages: - eng-Latn metrics: - type: accuracy value: 0.906407 - type: f1 value: 0.694457 - type: f1_weighted value: 0.917326 - type: main_score value: 0.906407 task: type: Classification - dataset: name: MTEB MTOPIntentClassification (de) type: mteb/mtop_intent revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba config: de split: test languages: - deu-Latn metrics: - type: accuracy value: 0.851 - type: f1 value: 0.601279 - type: f1_weighted value: 0.863969 - type: main_score value: 0.851 task: type: Classification - dataset: name: MTEB MTOPIntentClassification (es) type: mteb/mtop_intent revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba config: es split: test languages: - spa-Latn metrics: - type: accuracy value: 0.906738 - type: f1 value: 0.642295 - type: f1_weighted value: 0.910882 - type: main_score value: 0.906738 task: type: Classification - dataset: name: MTEB MTOPIntentClassification (fr) type: mteb/mtop_intent revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba config: fr split: test languages: - fra-Latn metrics: - type: accuracy value: 0.849045 - type: f1 value: 0.59923 - type: f1_weighted value: 0.863301 - type: main_score value: 0.849045 task: type: Classification - dataset: name: MTEB MTOPIntentClassification (hi) type: mteb/mtop_intent revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba config: hi split: test languages: - hin-Deva metrics: - type: accuracy value: 0.751094 - type: f1 value: 0.44095 - type: f1_weighted value: 0.762567 - type: main_score value: 0.751094 task: type: Classification - dataset: name: MTEB MTOPIntentClassification (th) type: mteb/mtop_intent revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba config: th split: test languages: - tha-Thai metrics: - type: accuracy value: 0.75566 - type: f1 value: 0.498529 - type: f1_weighted value: 0.76994 - type: main_score value: 0.75566 task: type: Classification base_model: - nvidia/NV-Embed-v2 --- For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [Github](https://github.com/neilwen987/CSR_Adaptive_Rep). ## Usage 📌 **Tip**: For NV-Embed-V2, using Transformers versions **later** than 4.47.0 may lead to performance degradation, as ``model_type=bidir_mistral`` in ``config.json`` is no longer supported. We recommend using ``Transformers 4.47.0.`` ### Sentence Transformers Usage You can evaluate this model loaded by Sentence Transformers with the following code snippet: ```python import mteb from sentence_transformers import SparseEncoder model = SparseEncoder( "Y-Research-Group/CSR-NV_Embed_v2-Classification-MTOPIntent", trust_remote_code=True ) model.prompts = { "MTOPIntentClassification": "Instruct: Classify the intent of the given utterance in task-oriented conversation\nQuery:" } task = mteb.get_tasks(tasks=["MTOPIntentClassification"]) evaluation = mteb.MTEB(tasks=task) evaluation.run(model, eval_splits=["test"], output_folder="./results/MTOPIntentClassification", show_progress_bar=True encode_kwargs={"convert_to_sparse_tensor": False, "batch_size": 8}, ) # MTEB don't support sparse tensors yet, so we need to convert to dense tensors ``` ## Citation ```bibtex @inproceedings{wenbeyond, title={Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation}, author={Wen, Tiansheng and Wang, Yifei and Zeng, Zequn and Peng, Zhong and Su, Yudi and Liu, Xinyang and Chen, Bo and Liu, Hongwei and Jegelka, Stefanie and You, Chenyu}, booktitle={Forty-second International Conference on Machine Learning} } ```