ModernBERT Embed base Legal Matryoshka

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2 on the json dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("IoannisKat1/all-MiniLM-L6-v2-legal-matryoshka")
# Run inference
sentences = [
    'When may the controller charge a reasonable fee to the data subject?',
    '1.The controller shall take appropriate measures to provide any information referred to in Articles 13 and 14 and any communication under Articles 15 to 22 and 34 relating to processing to the data subject in a concise, transparent, intelligible and easily accessible form, using clear and plain language, in particular for any information addressed specifically to a child. The information shall be provided in writing, or by other means, including, where appropriate, by electronic means. When requested by the data subject, the information may be provided orally, provided that the identity of the data subject is proven by other means. 4.5.2016 L 119/39  \n2.The controller shall facilitate the exercise of data subject rights under Articles 15 to 22. In the cases referred to in Article 11(2), the controller shall not refuse to act on the request of the data subject for exercising his or her rights under Articles 15 to 22, unless the controller demonstrates that it is not in a position to identify the data subject.\n3.The controller shall provide information on action taken on a request under Articles 15 to 22 to the data subject without undue delay and in any event within one month of receipt of the request. That period may be extended by two further months where necessary, taking into account the complexity and number of the requests. The controller shall inform the data subject of any such extension within one month of receipt of the request, together with the reasons for the delay. Where the data subject makes the request by electronic form means, the information shall be provided by electronic means where possible, unless otherwise requested by the data subject.\n4.If the controller does not take action on the request of the data subject, the controller shall inform the data subject without delay and at the latest within one month of receipt of the request of the reasons for not taking action and on the possibility of lodging a complaint with a supervisory authority and seeking a judicial remedy.\n5.Information provided under Articles 13 and 14 and any communication and any actions taken under Articles 15 to 22 and 34 shall be provided free of charge. Where requests from a data subject are manifestly unfounded or excessive, in particular because of their repetitive character, the controller may either: (a)  charge a reasonable fee taking into account the administrative costs of providing the information or communication or taking the action requested; or (b)  refuse to act on the request. The controller shall bear the burden of demonstrating the manifestly unfounded or excessive character of the request.\n6.Without prejudice to Article 11, where the controller has reasonable doubts concerning the identity of the natural person making the request referred to in Articles 15 to 21, the controller may request the provision of additional information necessary to confirm the identity of the data subject.\n7.The information to be provided to data subjects pursuant to Articles 13 and 14 may be provided in combination with standardised icons in order to give in an easily visible, intelligible and clearly legible manner a meaningful overview of the intended processing. Where the icons are presented electronically they shall be machine-readable.\n8.The Commission shall be empowered to adopt delegated acts in accordance with Article 92 for the purpose of determining the information to be presented by the icons and the procedures for providing standardised icons. Section 2 Information and access to personal data',
    'In order to ensure a consistent and high level of protection of natural persons and to remove the obstacles to flows of personal data within the Union, the level of protection of the rights and freedoms of natural persons with regard to the processing of such data should be equivalent in all Member States. Consistent and homogenous application of the rules for the protection of the fundamental rights and freedoms of natural persons with regard to the processing of personal data should be ensured throughout the Union. Regarding the processing of personal data for compliance with a legal obligation, for the performance of a task carried out in the public interest or in the exercise of official authority vested in the controller, Member States should be allowed to maintain or introduce national provisions to further specify the application of the rules of this Regulation. In conjunction with the general and horizontal law on data protection implementing Directive 95/46/EC, Member States have several sector-specific laws in areas that need more specific provisions. This Regulation also provides a margin of manoeuvre for Member States to specify its rules, including for the processing of special categories of personal data (‘sensitive data’). To that extent, this Regulation does not exclude Member State law that sets out the circumstances for specific processing situations, including determining more precisely the conditions under which the processing of personal data is lawful. 4.5.2016 L 119/2 Official Journal of the European Union EN',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.4141
cosine_accuracy@3 0.452
cosine_accuracy@5 0.4874
cosine_accuracy@10 0.5354
cosine_precision@1 0.4141
cosine_precision@3 0.4015
cosine_precision@5 0.3732
cosine_precision@10 0.3361
cosine_recall@1 0.0822
cosine_recall@3 0.2122
cosine_recall@5 0.279
cosine_recall@10 0.3908
cosine_ndcg@10 0.4708
cosine_mrr@10 0.4411
cosine_map@100 0.5192

Information Retrieval

Metric Value
cosine_accuracy@1 0.4167
cosine_accuracy@3 0.4495
cosine_accuracy@5 0.4722
cosine_accuracy@10 0.5227
cosine_precision@1 0.4167
cosine_precision@3 0.4007
cosine_precision@5 0.3692
cosine_precision@10 0.3278
cosine_recall@1 0.0832
cosine_recall@3 0.2108
cosine_recall@5 0.2719
cosine_recall@10 0.3754
cosine_ndcg@10 0.4636
cosine_mrr@10 0.4395
cosine_map@100 0.5182

Information Retrieval

Metric Value
cosine_accuracy@1 0.4167
cosine_accuracy@3 0.452
cosine_accuracy@5 0.4722
cosine_accuracy@10 0.5
cosine_precision@1 0.4167
cosine_precision@3 0.4057
cosine_precision@5 0.3753
cosine_precision@10 0.3258
cosine_recall@1 0.0793
cosine_recall@3 0.2059
cosine_recall@5 0.2724
cosine_recall@10 0.3712
cosine_ndcg@10 0.4585
cosine_mrr@10 0.4376
cosine_map@100 0.5028

Information Retrieval

Metric Value
cosine_accuracy@1 0.3712
cosine_accuracy@3 0.3965
cosine_accuracy@5 0.4167
cosine_accuracy@10 0.4596
cosine_precision@1 0.3712
cosine_precision@3 0.3561
cosine_precision@5 0.3273
cosine_precision@10 0.2891
cosine_recall@1 0.0748
cosine_recall@3 0.1863
cosine_recall@5 0.2404
cosine_recall@10 0.3351
cosine_ndcg@10 0.409
cosine_mrr@10 0.3897
cosine_map@100 0.4594

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 1,580 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 7 tokens
    • mean: 15.59 tokens
    • max: 37 tokens
    • min: 25 tokens
    • mean: 217.01 tokens
    • max: 256 tokens
  • Samples:
    anchor positive
    What should the right to receive personal data not prejudice according to the text? To further strengthen the control over his or her own data, where the processing of personal data is carried out by automated means, the data subject should also be allowed to receive personal data concerning him or her which he or she has provided to a controller in a structured, commonly used, machine-readable and interoperable format, and to transmit it to another controller. Data controllers should be encouraged to develop interoperable formats that enable data portability. That right should apply where the data subject provided the personal data on the basis of his or her consent or the processing is necessary for the performance of a contract. It should not apply where processing is based on a legal ground other than consent or contract. By its very nature, that right should not be exercised against controllers processing personal data in the exercise of their public duties. It should therefore not apply where the processing of the personal data is necessary for compliance with a...
    What devices can the plaintiff use to conduct transactions without physical presence at the branches of the defendant bank? **Court (Civil/Criminal): Civil
    Provisions:
    Time of commission of the act:
    Outcome (not guilty, guilty): Rejects the lawsuit.
    Reasoning:
    Facts: The plaintiff holds account number ....................... at the defendant bank. Following the application/contract dated January 9, 2019, the plaintiff became a subscriber to the alternative service networks provided by the defendant bank through online banking (.........................). In the aforementioned application, the plaintiff stated that her mobile phone (number .................) would be used by the defendant to send additional security codes for the approval of her transactions via .......................... It is noted that the plaintiff received subscriber code .................., thus enabling her to conduct transactions without her physical presence at the branches of the defendant bank, from fixed or mobile devices (computers, smartphones, tablets), by entering her username and password to access her personal acc...
    What may a complainant do if a complaint has been rejected by a supervisory authority? Any natural or legal person has the right to bring an action for annulment of decisions of the Board before the Court of Justice under the conditions provided for in Article 263 TFEU. As addressees of such decisions, the supervisory authorities concerned which wish to challenge them have to bring action within two months of being notified of them, in accordance with Article 263 TFEU. Where decisions of the Board are of direct and individual concern to a controller, processor or complainant, the latter may bring an action for annulment against those decisions within two months of their publication on the website of the Board, in accordance with Article 263 TFEU. Without prejudice to this right under Article 263 TFEU, each natural or legal person should have an effective judicial remedy before the competent national court against a decision of a supervisory authority which produces legal effects concerning that person. Such a decision concerns in particular the exercise of investigative,...
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            384,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • gradient_accumulation_steps: 2
  • learning_rate: 2e-05
  • num_train_epochs: 15
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 2
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 15
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss dim_384_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
0.1010 10 9.0952 - - - -
0.2020 20 8.9081 - - - -
0.3030 30 7.8026 - - - -
0.4040 40 8.3378 - - - -
0.5051 50 8.0661 - - - -
0.6061 60 9.8242 - - - -
0.7071 70 6.3302 - - - -
0.8081 80 6.734 - - - -
0.9091 90 7.2678 - - - -
1.0 99 - 0.3833 0.3566 0.3414 0.2912
1.0101 100 7.1781 - - - -
1.1111 110 4.8977 - - - -
1.2121 120 6.2696 - - - -
1.3131 130 5.5412 - - - -
1.4141 140 5.219 - - - -
1.5152 150 5.479 - - - -
1.6162 160 4.863 - - - -
1.7172 170 5.7901 - - - -
1.8182 180 4.9455 - - - -
1.9192 190 5.4065 - - - -
2.0 198 - 0.4130 0.3956 0.3815 0.3221
2.0202 200 3.71 - - - -
2.1212 210 4.6268 - - - -
2.2222 220 3.2052 - - - -
2.3232 230 4.692 - - - -
2.4242 240 4.1091 - - - -
2.5253 250 3.7498 - - - -
2.6263 260 3.9706 - - - -
2.7273 270 3.9981 - - - -
2.8283 280 3.5405 - - - -
2.9293 290 4.0765 - - - -
3.0 297 - 0.4333 0.4308 0.4065 0.3411
3.0303 300 2.83 - - - -
3.1313 310 2.667 - - - -
3.2323 320 3.3436 - - - -
3.3333 330 2.9749 - - - -
3.4343 340 2.4349 - - - -
3.5354 350 2.7929 - - - -
3.6364 360 3.1146 - - - -
3.7374 370 2.8317 - - - -
3.8384 380 3.2532 - - - -
3.9394 390 3.5831 - - - -
4.0 396 - 0.4592 0.4413 0.4274 0.3743
4.0404 400 3.166 - - - -
4.1414 410 2.9408 - - - -
4.2424 420 1.9234 - - - -
4.3434 430 2.7478 - - - -
4.4444 440 2.1347 - - - -
4.5455 450 2.5566 - - - -
4.6465 460 2.2541 - - - -
4.7475 470 2.5956 - - - -
4.8485 480 2.4867 - - - -
4.9495 490 2.1738 - - - -
5.0 495 - 0.4465 0.4481 0.4170 0.3747
5.0505 500 2.3014 - - - -
5.1515 510 1.4828 - - - -
5.2525 520 2.046 - - - -
5.3535 530 1.6265 - - - -
5.4545 540 1.9582 - - - -
5.5556 550 2.5307 - - - -
5.6566 560 2.308 - - - -
5.7576 570 1.3316 - - - -
5.8586 580 1.7351 - - - -
5.9596 590 1.8462 - - - -
6.0 594 - 0.4859 0.4706 0.4456 0.3907
6.0606 600 1.5274 - - - -
6.1616 610 2.2816 - - - -
6.2626 620 1.4639 - - - -
6.3636 630 1.3246 - - - -
6.4646 640 1.9837 - - - -
6.5657 650 1.8552 - - - -
6.6667 660 1.5951 - - - -
6.7677 670 1.3286 - - - -
6.8687 680 1.3191 - - - -
6.9697 690 1.7146 - - - -
7.0 693 - 0.4825 0.4686 0.4285 0.3948
7.0707 700 1.3537 - - - -
7.1717 710 1.2444 - - - -
7.2727 720 1.3479 - - - -
7.3737 730 1.9802 - - - -
7.4747 740 1.5059 - - - -
7.5758 750 1.3042 - - - -
7.6768 760 1.3857 - - - -
7.7778 770 1.5275 - - - -
7.8788 780 1.3907 - - - -
7.9798 790 1.5492 - - - -
8.0 792 - 0.4708 0.4636 0.4585 0.409
8.0808 800 1.0543 - - - -
8.1818 810 0.9813 - - - -
8.2828 820 1.1873 - - - -
8.3838 830 1.3405 - - - -
8.4848 840 1.6782 - - - -
8.5859 850 1.2311 - - - -
8.6869 860 1.9342 - - - -
8.7879 870 0.8973 - - - -
8.8889 880 2.1017 - - - -
8.9899 890 1.2202 - - - -
9.0 891 - 0.4764 0.4825 0.4578 0.4128
9.0909 900 1.0014 - - - -
9.1919 910 0.8363 - - - -
9.2929 920 0.9542 - - - -
9.3939 930 1.1752 - - - -
9.4949 940 1.3158 - - - -
9.5960 950 1.7042 - - - -
9.6970 960 0.8945 - - - -
9.7980 970 1.0183 - - - -
9.8990 980 1.099 - - - -
10.0 990 1.5132 0.4816 0.4848 0.4532 0.4187
-1 -1 - 0.4708 0.4636 0.4585 0.4090
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 4.1.0
  • Transformers: 4.51.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.8.1
  • Datasets: 4.0.0
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
3
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for IoannisKat1/all-MiniLM-L6-v2-legal-matryoshka

Finetuned
(464)
this model

Evaluation results