ModernBERT Embed base Legal Matryoshka

This is a sentence-transformers model finetuned from intfloat/multilingual-e5-large on the json dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: intfloat/multilingual-e5-large
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("IoannisKat1/multilingual-e5-large-legal-matryoshka")
# Run inference
sentences = [
    'How long is the period that can be added due to the complexity of the subject-matter?',
    '1.In order to ensure the correct and consistent application of this Regulation in individual cases, the Board shall adopt a binding decision in the following cases: (a)  where, in a case referred to in Article 60(4), a supervisory authority concerned has raised a relevant and reasoned objection to a draft decision of the lead authority or the lead authority has rejected such an objection as being not relevant or reasoned. The binding decision shall concern all the matters which are the subject of the relevant and reasoned objection, in particular whether there is an infringement of this Regulation; 4.5.2016 L 119/74   (b)  where there are conflicting views on which of the supervisory authorities concerned is competent for the main establishment; (c)  where a competent supervisory authority does not request the opinion of the Board in the cases referred to in Article 64(1), or does not follow the opinion of the Board issued under Article 64. In that case, any supervisory authority concerned or the Commission may communicate the matter to the Board.\n2.The decision referred to in paragraph 1 shall be adopted within one month from the referral of the subject-matter by a two-thirds majority of the members of the Board. That period may be extended by a further month on account of the complexity of the subject-matter. The decision referred to in paragraph 1 shall be reasoned and addressed to the lead supervisory authority and all the supervisory authorities concerned and binding on them.\n3.Where the Board has been unable to adopt a decision within the periods referred to in paragraph 2, it shall adopt its decision within two weeks following the expiration of the second month referred to in paragraph 2 by a simple majority of the members of the Board. Where the members of the Board are split, the decision shall by adopted by the vote of its Chair.\n4.The supervisory authorities concerned shall not adopt a decision on the subject matter submitted to the Board under paragraph 1 during the periods referred to in paragraphs 2 and 3\n5.The Chair of the Board shall notify, without undue delay, the decision referred to in paragraph 1 to the supervisory authorities concerned. It shall inform the Commission thereof. The decision shall be published on the website of the Board without delay after the supervisory authority has notified the final decision referred to in paragraph 6\n6.The lead supervisory authority or, as the case may be, the supervisory authority with which the complaint has been lodged shall adopt its final decision on the basis of the decision referred to in paragraph 1 of this Article, without undue delay and at the latest by one month after the Board has notified its decision. The lead supervisory authority or, as the case may be, the supervisory authority with which the complaint has been lodged, shall inform the Board of the date when its final decision is notified respectively to the controller or the processor and to the data subject. The final decision of the supervisory authorities concerned shall be adopted under the terms of Article 60(7), (8) and (9). The final decision shall refer to the decision referred to in paragraph 1 of this Article and shall specify that the decision referred to in that paragraph will be published on the website of the Board in accordance with paragraph 5 of this Article. The final decision shall attach the decision referred to in paragraph 1 of this Article.',
    "Each supervisory authority not acting as the lead supervisory authority should be competent to handle local cases where the controller or processor is established in more than one Member State, but the subject matter of the specific processing concerns only processing carried out in a single Member State and involves only data subjects in that single Member State, for example, where the subject matter concerns the processing of employees' personal data in the specific employment context of a Member State. In such cases, the supervisory authority should inform the lead supervisory authority without delay about the matter. After being informed, the lead supervisory authority should decide, whether it will handle the case pursuant to the provision on cooperation between the lead supervisory authority and other supervisory authorities concerned (‘one-stop-shop mechanism’), or whether the supervisory authority which informed it should handle the case at local level. When deciding whether it will handle the case, the lead supervisory authority should take into account whether there is an establishment of the controller or processor in the Member State of the supervisory authority which informed it in order to ensure effective enforcement of a decision vis-à-vis the controller or processor. Where the lead supervisory authority decides to handle the case, the supervisory authority which informed it should have the 4.5.2016 L 119/23 Official Journal of the European Union EN   possibility to submit a draft for a decision, of which the lead supervisory authority should take utmost account when preparing its draft decision in that one-stop-shop mechanism.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.4798
cosine_accuracy@3 0.5101
cosine_accuracy@5 0.5429
cosine_accuracy@10 0.5833
cosine_precision@1 0.4798
cosine_precision@3 0.463
cosine_precision@5 0.4343
cosine_precision@10 0.3833
cosine_recall@1 0.0927
cosine_recall@3 0.2289
cosine_recall@5 0.3077
cosine_recall@10 0.4266
cosine_ndcg@10 0.5293
cosine_mrr@10 0.5031
cosine_map@100 0.5845

Information Retrieval

Metric Value
cosine_accuracy@1 0.4848
cosine_accuracy@3 0.5101
cosine_accuracy@5 0.5404
cosine_accuracy@10 0.5884
cosine_precision@1 0.4848
cosine_precision@3 0.4663
cosine_precision@5 0.4348
cosine_precision@10 0.3843
cosine_recall@1 0.0934
cosine_recall@3 0.2314
cosine_recall@5 0.3103
cosine_recall@10 0.4291
cosine_ndcg@10 0.5319
cosine_mrr@10 0.507
cosine_map@100 0.5864

Information Retrieval

Metric Value
cosine_accuracy@1 0.4874
cosine_accuracy@3 0.5126
cosine_accuracy@5 0.5354
cosine_accuracy@10 0.5758
cosine_precision@1 0.4874
cosine_precision@3 0.468
cosine_precision@5 0.4338
cosine_precision@10 0.3768
cosine_recall@1 0.0944
cosine_recall@3 0.2337
cosine_recall@5 0.3123
cosine_recall@10 0.4242
cosine_ndcg@10 0.5283
cosine_mrr@10 0.507
cosine_map@100 0.5822

Information Retrieval

Metric Value
cosine_accuracy@1 0.4798
cosine_accuracy@3 0.5076
cosine_accuracy@5 0.5404
cosine_accuracy@10 0.5884
cosine_precision@1 0.4798
cosine_precision@3 0.4621
cosine_precision@5 0.4379
cosine_precision@10 0.3912
cosine_recall@1 0.0886
cosine_recall@3 0.2158
cosine_recall@5 0.2945
cosine_recall@10 0.4202
cosine_ndcg@10 0.5299
cosine_mrr@10 0.5031
cosine_map@100 0.5755

Information Retrieval

Metric Value
cosine_accuracy@1 0.4798
cosine_accuracy@3 0.4975
cosine_accuracy@5 0.5328
cosine_accuracy@10 0.5732
cosine_precision@1 0.4798
cosine_precision@3 0.4596
cosine_precision@5 0.4318
cosine_precision@10 0.379
cosine_recall@1 0.0901
cosine_recall@3 0.2197
cosine_recall@5 0.298
cosine_recall@10 0.4068
cosine_ndcg@10 0.5199
cosine_mrr@10 0.4994
cosine_map@100 0.5701

Information Retrieval

Metric Value
cosine_accuracy@1 0.4419
cosine_accuracy@3 0.4672
cosine_accuracy@5 0.5076
cosine_accuracy@10 0.5631
cosine_precision@1 0.4419
cosine_precision@3 0.4251
cosine_precision@5 0.4005
cosine_precision@10 0.3588
cosine_recall@1 0.0834
cosine_recall@3 0.2079
cosine_recall@5 0.2837
cosine_recall@10 0.3959
cosine_ndcg@10 0.4938
cosine_mrr@10 0.4664
cosine_map@100 0.5457

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 1,580 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 7 tokens
    • mean: 17.31 tokens
    • max: 39 tokens
    • min: 27 tokens
    • mean: 381.69 tokens
    • max: 512 tokens
  • Samples:
    anchor positive
    What is required when reproducing or using a work according to Article 60 of this Law? Any person who, in contravention of the provisions of this law or of the provisions of lawfully ratified multilateral international conventions on the protection of copyright, unlawfully makes a fixation of a work or of copies, reproduces them directly or indirectly, temporarily or permanently in any form, in whole or in part, translates, adapts, alters or transforms them, or distributes them to the public by sale or other means, or possesses with the intent of distributing them, rents, performs in public, broadcasts by radio or television or any other means, communicates to the public works or copies by any means, imports copies of a work illegally produced abroad without the consent of the author and, in general, exploits works, reproductions or copies being the object of copyright or acts against the moral right of the author to decide freely on the publication and the presentation of his work to the public without additions or deletions, shall be liable to imprisonment of no less t...
    Who electronically sent the request? Court (Civil/Criminal): Civil

    Provisions:

    Time of commission of the act:

    Outcome (not guilty, guilty):

    Rationale:

    Facts:
    The plaintiff holds credit card number ............ with the defendant banking corporation. Based on the application for alternative networks dated 19/7/2015 with number ......... submitted at a branch of the defendant, he was granted access to the electronic banking service (e-banking) to conduct banking transactions (debit, credit, updates, payments) remotely. On 30/11/2020, the plaintiff fell victim to electronic fraud through the "phishing" method, whereby an unknown perpetrator managed to withdraw a total amount of €3,121.75 from the aforementioned credit card. Specifically, the plaintiff received an email at 1:35 PM on 29/11/2020 from sender ...... with address ........, informing him that due to an impending system change, he needed to verify the mobile phone number linked to the credit card, urging him to complete the verification...
    What right should not imply the erasure of personal data needed for a contract? To further strengthen the control over his or her own data, where the processing of personal data is carried out by automated means, the data subject should also be allowed to receive personal data concerning him or her which he or she has provided to a controller in a structured, commonly used, machine-readable and interoperable format, and to transmit it to another controller. Data controllers should be encouraged to develop interoperable formats that enable data portability. That right should apply where the data subject provided the personal data on the basis of his or her consent or the processing is necessary for the performance of a contract. It should not apply where processing is based on a legal ground other than consent or contract. By its very nature, that right should not be exercised against controllers processing personal data in the exercise of their public duties. It should therefore not apply where the processing of the personal data is necessary for compliance with a...
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • gradient_accumulation_steps: 2
  • learning_rate: 2e-05
  • num_train_epochs: 15
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 2
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 15
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss dim_1024_cosine_ndcg@10 dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
0.1010 10 15.4718 - - - - - -
0.2020 20 14.6553 - - - - - -
0.3030 30 13.0692 - - - - - -
0.4040 40 11.9374 - - - - - -
0.5051 50 10.391 - - - - - -
0.6061 60 9.4382 - - - - - -
0.7071 70 10.3454 - - - - - -
0.8081 80 8.2018 - - - - - -
0.9091 90 8.5864 - - - - - -
1.0 99 - 0.4460 0.4412 0.4274 0.4195 0.3848 0.3259
1.0101 100 8.6625 - - - - - -
1.1111 110 4.7502 - - - - - -
1.2121 120 6.2492 - - - - - -
1.3131 130 4.8127 - - - - - -
1.4141 140 6.1704 - - - - - -
1.5152 150 6.6274 - - - - - -
1.6162 160 5.399 - - - - - -
1.7172 170 7.426 - - - - - -
1.8182 180 7.1311 - - - - - -
1.9192 190 5.946 - - - - - -
2.0 198 - 0.4016 0.4013 0.4014 0.4000 0.3593 0.3004
2.0202 200 6.2856 - - - - - -
2.1212 210 3.5988 - - - - - -
2.2222 220 4.2901 - - - - - -
2.3232 230 3.1813 - - - - - -
2.4242 240 3.5008 - - - - - -
2.5253 250 5.1129 - - - - - -
2.6263 260 4.2109 - - - - - -
2.7273 270 3.6912 - - - - - -
2.8283 280 3.3291 - - - - - -
2.9293 290 4.4076 - - - - - -
3.0 297 - 0.4627 0.4581 0.4402 0.4497 0.4050 0.3634
3.0303 300 4.7613 - - - - - -
3.1313 310 2.8859 - - - - - -
3.2323 320 2.3837 - - - - - -
3.3333 330 2.3918 - - - - - -
3.4343 340 3.2392 - - - - - -
3.5354 350 3.0618 - - - - - -
3.6364 360 3.4849 - - - - - -
3.7374 370 2.9544 - - - - - -
3.8384 380 3.7552 - - - - - -
3.9394 390 2.8886 - - - - - -
4.0 396 - 0.4759 0.4672 0.4716 0.4775 0.4500 0.4110
4.0404 400 2.6625 - - - - - -
4.1414 410 1.4966 - - - - - -
4.2424 420 1.7481 - - - - - -
4.3434 430 1.854 - - - - - -
4.4444 440 2.7405 - - - - - -
4.5455 450 2.1245 - - - - - -
4.6465 460 2.2464 - - - - - -
4.7475 470 3.2942 - - - - - -
4.8485 480 3.0603 - - - - - -
4.9495 490 2.2054 - - - - - -
5.0 495 - 0.4999 0.4997 0.4861 0.4865 0.4489 0.4270
5.0505 500 2.0539 - - - - - -
5.1515 510 1.715 - - - - - -
5.2525 520 1.214 - - - - - -
5.3535 530 1.7161 - - - - - -
5.4545 540 1.8882 - - - - - -
5.5556 550 2.0362 - - - - - -
5.6566 560 1.7486 - - - - - -
5.7576 570 2.1613 - - - - - -
5.8586 580 1.4125 - - - - - -
5.9596 590 2.3426 - - - - - -
6.0 594 - 0.5096 0.5053 0.4923 0.5008 0.4707 0.4483
6.0606 600 1.3931 - - - - - -
6.1616 610 1.5685 - - - - - -
6.2626 620 0.6951 - - - - - -
6.3636 630 2.4788 - - - - - -
6.4646 640 1.8925 - - - - - -
6.5657 650 1.3563 - - - - - -
6.6667 660 0.8888 - - - - - -
6.7677 670 1.1035 - - - - - -
6.8687 680 1.2746 - - - - - -
6.9697 690 1.3126 - - - - - -
7.0 693 - 0.5047 0.5048 0.4941 0.4856 0.4623 0.4286
7.0707 700 1.1118 - - - - - -
7.1717 710 0.5125 - - - - - -
7.2727 720 1.1709 - - - - - -
7.3737 730 1.4081 - - - - - -
7.4747 740 1.2162 - - - - - -
7.5758 750 1.5649 - - - - - -
7.6768 760 1.1022 - - - - - -
7.7778 770 1.1308 - - - - - -
7.8788 780 1.4802 - - - - - -
7.9798 790 0.789 - - - - - -
8.0 792 - 0.5137 0.5091 0.5177 0.5103 0.4781 0.4697
8.0808 800 0.9737 - - - - - -
8.1818 810 1.1912 - - - - - -
8.2828 820 0.3296 - - - - - -
8.3838 830 0.8433 - - - - - -
8.4848 840 0.7217 - - - - - -
8.5859 850 1.002 - - - - - -
8.6869 860 1.0049 - - - - - -
8.7879 870 0.2311 - - - - - -
8.8889 880 0.3288 - - - - - -
8.9899 890 0.4877 - - - - - -
9.0 891 - 0.5116 0.5005 0.5073 0.5034 0.4832 0.4694
9.0909 900 0.9198 - - - - - -
9.1919 910 0.3942 - - - - - -
9.2929 920 0.6227 - - - - - -
9.3939 930 0.4507 - - - - - -
9.4949 940 0.5001 - - - - - -
9.5960 950 0.747 - - - - - -
9.6970 960 1.1764 - - - - - -
9.7980 970 1.0748 - - - - - -
9.8990 980 0.3899 - - - - - -
10.0 990 0.3206 0.5143 0.5097 0.5149 0.5156 0.5041 0.4818
10.1010 1000 0.4675 - - - - - -
10.2020 1010 0.8518 - - - - - -
10.3030 1020 0.3375 - - - - - -
10.4040 1030 0.386 - - - - - -
10.5051 1040 0.5168 - - - - - -
10.6061 1050 1.2228 - - - - - -
10.7071 1060 0.4282 - - - - - -
10.8081 1070 0.282 - - - - - -
10.9091 1080 0.9158 - - - - - -
11.0 1089 - 0.5390 0.5374 0.5394 0.5309 0.5189 0.4965
11.0101 1090 0.1981 - - - - - -
11.1111 1100 0.2708 - - - - - -
11.2121 1110 0.36 - - - - - -
11.3131 1120 0.8882 - - - - - -
11.4141 1130 0.3434 - - - - - -
11.5152 1140 0.2293 - - - - - -
11.6162 1150 0.6078 - - - - - -
11.7172 1160 1.0283 - - - - - -
11.8182 1170 1.1603 - - - - - -
11.9192 1180 0.8614 - - - - - -
12.0 1188 - 0.5293 0.5319 0.5283 0.5299 0.5199 0.4938
12.0202 1190 0.3831 - - - - - -
12.1212 1200 0.1211 - - - - - -
12.2222 1210 0.5642 - - - - - -
12.3232 1220 0.8418 - - - - - -
12.4242 1230 0.3555 - - - - - -
12.5253 1240 0.1138 - - - - - -
12.6263 1250 0.1858 - - - - - -
12.7273 1260 0.3527 - - - - - -
12.8283 1270 0.9225 - - - - - -
12.9293 1280 0.3991 - - - - - -
13.0 1287 - 0.5228 0.5254 0.5311 0.5268 0.5189 0.4959
13.0303 1290 0.1794 - - - - - -
13.1313 1300 0.2153 - - - - - -
13.2323 1310 0.14 - - - - - -
13.3333 1320 0.5314 - - - - - -
13.4343 1330 0.2421 - - - - - -
13.5354 1340 0.2911 - - - - - -
13.6364 1350 0.4683 - - - - - -
13.7374 1360 0.3907 - - - - - -
13.8384 1370 0.7038 - - - - - -
13.9394 1380 0.0922 - - - - - -
14.0 1386 - 0.5197 0.5209 0.5271 0.5236 0.5143 0.5012
-1 -1 - 0.5293 0.5319 0.5283 0.5299 0.5199 0.4938
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 4.1.0
  • Transformers: 4.51.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.8.1
  • Datasets: 4.0.0
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
106
Safetensors
Model size
560M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for IoannisKat1/multilingual-e5-large-legal-matryoshka

Finetuned
(109)
this model

Evaluation results