ModernBERT Embed base Legal Matryoshka

This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2 on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-mpnet-base-v2
  • Maximum Sequence Length: 384 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("IoannisKat1/all-mpnet-base-v2-legal-matryoshka")
# Run inference
sentences = [
    'What is the right given in point (e)?',
    "1.Where personal data have not been obtained from the data subject, the controller shall provide the data subject with the following information: (a)  the identity and the contact details of the controller and, where applicable, of the controller's representative; (b)  the contact details of the data protection officer, where applicable; (c)  the purposes of the processing for which the personal data are intended as well as the legal basis for the processing; (d)  the categories of personal data concerned; (e)  the recipients or categories of recipients of the personal data, if any; 4.5.2016 L 119/41   (f) where applicable, that the controller intends to transfer personal data to a recipient in a third country or international organisation and the existence or absence of an adequacy decision by the Commission, or in the case of transfers referred to in Article 46 or 47, or the second subparagraph of Article 49(1), reference to the appropriate or suitable safeguards and the means to obtain a copy of them or where they have been made available.\n2.In addition to the information referred to in paragraph 1, the controller shall provide the data subject with the following information necessary to ensure fair and transparent processing in respect of the data subject: (a)  the period for which the personal data will be stored, or if that is not possible, the criteria used to determine that period; (b)  where the processing is based on point (f) of Article 6(1), the legitimate interests pursued by the controller or by a third party; (c)  the existence of the right to request from the controller access to and rectification or erasure of personal data or restriction of processing concerning the data subject and to object to processing as well as the right to data portability; (d)  where processing is based on point (a) of Article 6(1) or point (a) of Article 9(2), the existence of the right to withdraw consent at any time, without affecting the lawfulness of processing based on consent before its withdrawal; (e)  the right to lodge a complaint with a supervisory authority; (f)  from which source the personal data originate, and if applicable, whether it came from publicly accessible sources; (g)  the existence of automated decision-making, including profiling, referred to in Article 22(1) and (4) and, at least in those cases, meaningful information about the logic involved, as well as the significance and the envisaged consequences of such processing for the data subject.\n3.The controller shall provide the information referred to in paragraphs 1 and 2: (a)  within a reasonable period after obtaining the personal data, but at the latest within one month, having regard to the specific circumstances in which the personal data are processed; (b)  if the personal data are to be used for communication with the data subject, at the latest at the time of the first communication to that data subject; or (c)  if a disclosure to another recipient is envisaged, at the latest when the personal data are first disclosed.\n4.Where the controller intends to further process the personal data for a purpose other than that for which the personal data were obtained, the controller shall provide the data subject prior to that further processing with information on that other purpose and with any relevant further information as referred to in paragraph 2\n5.Paragraphs 1 to 4 shall not apply where and insofar as: (a)  the data subject already has the information; (b)  the provision of such information proves impossible or would involve a disproportionate effort, in particular for processing for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes, subject to the conditions and safeguards referred to in Article 89(1) or in so far as the obligation referred to in paragraph 1 of this Article is likely to render impossible or seriously impair the achievement of the objectives of that processing. In such cases the controller shall take appropriate measures to protect the data subject's rights and freedoms and legitimate interests, including making the information publicly available; (c)  obtaining or disclosure is expressly laid down by Union or Member State law to which the controller is subject and which provides appropriate measures to protect the data subject's legitimate interests; or (d)  where the personal data must remain confidential subject to an obligation of professional secrecy regulated by Union or Member State law, including a statutory obligation of secrecy. 4.5.2016 L 119/42",
    'The risk to the rights and freedoms of natural persons, of varying likelihood and severity, may result from personal data processing which could lead to physical, material or non-material damage, in particular: where the processing may give rise to discrimination, identity theft or fraud, financial loss, damage to the reputation, loss of confidentiality of personal data protected by professional secrecy, unauthorised reversal of pseudonymisation, or any other significant economic or social disadvantage; where data subjects might be deprived of their rights and freedoms or prevented from exercising control over their personal data; where personal data are processed which reveal racial or ethnic origin, political opinions, religion or philosophical beliefs, trade union membership, and the processing of genetic data, data concerning health or data concerning sex life or criminal convictions and offences or related security measures; where personal aspects are evaluated, in particular analysing or predicting aspects concerning performance at work, economic situation, health, personal preferences or interests, reliability or behaviour, location or movements, in order to create or use personal profiles; where personal data of vulnerable natural persons, in particular of children, are processed; or where processing involves a large amount of personal data and affects a large number of data subjects.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.4343
cosine_accuracy@3 0.4848
cosine_accuracy@5 0.5278
cosine_accuracy@10 0.5732
cosine_precision@1 0.4343
cosine_precision@3 0.4192
cosine_precision@5 0.3899
cosine_precision@10 0.3402
cosine_recall@1 0.0983
cosine_recall@3 0.2494
cosine_recall@5 0.3277
cosine_recall@10 0.4403
cosine_ndcg@10 0.5025
cosine_mrr@10 0.468
cosine_map@100 0.5626

Information Retrieval

Metric Value
cosine_accuracy@1 0.4444
cosine_accuracy@3 0.4924
cosine_accuracy@5 0.5328
cosine_accuracy@10 0.5884
cosine_precision@1 0.4444
cosine_precision@3 0.4293
cosine_precision@5 0.4
cosine_precision@10 0.3523
cosine_recall@1 0.0962
cosine_recall@3 0.2433
cosine_recall@5 0.3239
cosine_recall@10 0.4374
cosine_ndcg@10 0.5105
cosine_mrr@10 0.4773
cosine_map@100 0.564

Information Retrieval

Metric Value
cosine_accuracy@1 0.4394
cosine_accuracy@3 0.4798
cosine_accuracy@5 0.5303
cosine_accuracy@10 0.5758
cosine_precision@1 0.4394
cosine_precision@3 0.4251
cosine_precision@5 0.397
cosine_precision@10 0.3465
cosine_recall@1 0.0916
cosine_recall@3 0.2397
cosine_recall@5 0.3238
cosine_recall@10 0.434
cosine_ndcg@10 0.5024
cosine_mrr@10 0.4707
cosine_map@100 0.5531

Information Retrieval

Metric Value
cosine_accuracy@1 0.4268
cosine_accuracy@3 0.4571
cosine_accuracy@5 0.4975
cosine_accuracy@10 0.5379
cosine_precision@1 0.4268
cosine_precision@3 0.4116
cosine_precision@5 0.3793
cosine_precision@10 0.3275
cosine_recall@1 0.0875
cosine_recall@3 0.2301
cosine_recall@5 0.3047
cosine_recall@10 0.4146
cosine_ndcg@10 0.4785
cosine_mrr@10 0.4517
cosine_map@100 0.5307

Information Retrieval

Metric Value
cosine_accuracy@1 0.3712
cosine_accuracy@3 0.4015
cosine_accuracy@5 0.4343
cosine_accuracy@10 0.4874
cosine_precision@1 0.3712
cosine_precision@3 0.3552
cosine_precision@5 0.3298
cosine_precision@10 0.2841
cosine_recall@1 0.0794
cosine_recall@3 0.2022
cosine_recall@5 0.274
cosine_recall@10 0.3728
cosine_ndcg@10 0.4203
cosine_mrr@10 0.3955
cosine_map@100 0.4722

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 1,580 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 7 tokens
    • mean: 15.41 tokens
    • max: 32 tokens
    • min: 31 tokens
    • mean: 294.46 tokens
    • max: 384 tokens
  • Samples:
    anchor positive
    Who is empowered to adopt delegated acts according to Article 92? 1.The controller shall take appropriate measures to provide any information referred to in Articles 13 and 14 and any communication under Articles 15 to 22 and 34 relating to processing to the data subject in a concise, transparent, intelligible and easily accessible form, using clear and plain language, in particular for any information addressed specifically to a child. The information shall be provided in writing, or by other means, including, where appropriate, by electronic means. When requested by the data subject, the information may be provided orally, provided that the identity of the data subject is proven by other means. 4.5.2016 L 119/39
    2.The controller shall facilitate the exercise of data subject rights under Articles 15 to 22. In the cases referred to in Article 11(2), the controller shall not refuse to act on the request of the data subject for exercising his or her rights under Articles 15 to 22, unless the controller demonstrates that it is not in a position to ide...
    What is the specific range of fines for violating the provisions mentioned? Rights management information includes data identifying the work, its rightholder, terms of use, or codes representing such information.
    It is prohibited to knowingly remove or alter rights management information, or distribute protected works without such information if this facilitates copyright infringement.
    Violation of these provisions is punished by imprisonment of at least one year and a fine of 2,900 to 15,000 Euro, with applicable civil sanctions under article 65 of Law 2121/1993.
    What is the purpose of specifying the controller or categories of controllers in a legislative measure? 1.Union or Member State law to which the data controller or processor is subject may restrict by way of a legislative measure the scope of the obligations and rights provided for in Articles 12 to 22 and Article 34, as well as Article 5 in so far as its provisions correspond to the rights and obligations provided for in Articles 12 to 22, when such a restriction respects the essence of the fundamental rights and freedoms and is a necessary and proportionate measure in a democratic society to safeguard: (a) national security; (b) defence; (c) public security; 4.5.2016 L 119/46 (d) the prevention, investigation, detection or prosecution of criminal offences or the execution of criminal penalties, including the safeguarding against and the prevention of threats to public security; (e) other important objectives of general public interest of the Union or of a Member State, in particular an important economic or financial interest of the Union or of a Member State, including monetary...
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • gradient_accumulation_steps: 2
  • learning_rate: 2e-05
  • num_train_epochs: 15
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 2
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 15
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
0.1010 10 12.8665 - - - - -
0.2020 20 10.8982 - - - - -
0.3030 30 9.4829 - - - - -
0.4040 40 10.5 - - - - -
0.5051 50 8.0613 - - - - -
0.6061 60 10.1388 - - - - -
0.7071 70 8.2618 - - - - -
0.8081 80 7.5501 - - - - -
0.9091 90 7.2818 - - - - -
1.0 99 - 0.4229 0.4277 0.4160 0.3713 0.3255
1.0101 100 7.1823 - - - - -
1.1111 110 6.4082 - - - - -
1.2121 120 6.0372 - - - - -
1.3131 130 5.3984 - - - - -
1.4141 140 6.1314 - - - - -
1.5152 150 6.017 - - - - -
1.6162 160 5.3294 - - - - -
1.7172 170 4.8586 - - - - -
1.8182 180 5.1905 - - - - -
1.9192 190 4.9228 - - - - -
2.0 198 - 0.4948 0.4847 0.4656 0.4279 0.3989
2.0202 200 5.0552 - - - - -
2.1212 210 3.605 - - - - -
2.2222 220 3.4013 - - - - -
2.3232 230 3.8835 - - - - -
2.4242 240 3.5379 - - - - -
2.5253 250 3.1477 - - - - -
2.6263 260 3.0839 - - - - -
2.7273 270 3.1072 - - - - -
2.8283 280 3.4296 - - - - -
2.9293 290 2.2994 - - - - -
3.0 297 - 0.4817 0.4640 0.4588 0.4456 0.3951
3.0303 300 2.823 - - - - -
3.1313 310 2.4173 - - - - -
3.2323 320 2.9838 - - - - -
3.3333 330 1.7402 - - - - -
3.4343 340 1.9698 - - - - -
3.5354 350 2.0855 - - - - -
3.6364 360 2.0332 - - - - -
3.7374 370 2.0153 - - - - -
3.8384 380 2.3639 - - - - -
3.9394 390 2.5413 - - - - -
4.0 396 - 0.5025 0.5105 0.5024 0.4785 0.4203
4.0404 400 2.1864 - - - - -
4.1414 410 1.9434 - - - - -
4.2424 420 1.9391 - - - - -
4.3434 430 1.6913 - - - - -
4.4444 440 2.1447 - - - - -
4.5455 450 1.844 - - - - -
4.6465 460 1.4044 - - - - -
4.7475 470 1.2469 - - - - -
4.8485 480 1.7656 - - - - -
4.9495 490 1.9071 - - - - -
5.0 495 - 0.4948 0.4976 0.4972 0.4539 0.4277
5.0505 500 1.5534 - - - - -
5.1515 510 1.2796 - - - - -
5.2525 520 1.8969 - - - - -
5.3535 530 1.679 - - - - -
5.4545 540 1.2078 - - - - -
5.5556 550 1.5672 - - - - -
5.6566 560 0.9042 - - - - -
5.7576 570 0.9742 - - - - -
5.8586 580 1.9878 - - - - -
5.9596 590 1.6131 - - - - -
6.0 594 - 0.5176 0.5066 0.5065 0.4751 0.4504
-1 -1 - 0.5025 0.5105 0.5024 0.4785 0.4203
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 4.1.0
  • Transformers: 4.51.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.8.1
  • Datasets: 4.0.0
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
14
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for IoannisKat1/all-mpnet-base-v2-legal-matryoshka

Finetuned
(281)
this model

Evaluation results