ModernBERT Embed base Legal Matryoshka

This is a sentence-transformers model finetuned from google-bert/bert-base-uncased on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: google-bert/bert-base-uncased
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("IoannisKat1/bert-base-uncased-legal-matryoshka")
# Run inference
sentences = [
    'What remedy is available to a data subject if their rights are infringed?',
    '1.Without prejudice to any available administrative or non-judicial remedy, including the right to lodge a complaint with a supervisory authority pursuant to Article 77, each data subject shall have the right to an effective judicial remedy where he or she considers that his or her rights under this Regulation have been infringed as a result of the processing of his or her personal data in non-compliance with this Regulation.\n2.Proceedings against a controller or a processor shall be brought before the courts of the Member State where the controller or processor has an establishment. Alternatively, such proceedings may be brought before the courts of the Member State where the data subject has his or her habitual residence, unless the controller or processor is a public authority of a Member State acting in the exercise of its public powers. 4.5.2016 L 119/80   (1) Regulation (EC) No 1049/2001 of the European Parliament and of the Council of 30 May 2001 regarding public access to European Parliament, Council and Commission documents (OJ L 145, 31.5.2001, p. 43).',
    '1.The controller shall consult the supervisory authority prior to processing where a data protection impact assessment under Article 35 indicates that the processing would result in a high risk in the absence of measures taken by the controller to mitigate the risk.\n2.Where the supervisory authority is of the opinion that the intended processing referred to in paragraph 1 would infringe this Regulation, in particular where the controller has insufficiently identified or mitigated the risk, the supervisory authority shall, within period of up to eight weeks of receipt of the request for consultation, provide written advice to the controller and, where applicable to the processor, and may use any of its powers referred to in Article 58. That period may be extended by six weeks, taking into account the complexity of the intended processing. The supervisory authority shall inform the controller and, where applicable, the processor, of any such extension within one month of receipt of the request for consultation together with the reasons for the delay. Those periods may be suspended until the supervisory authority has obtained information it has requested for the purposes of the consultation.\n3.When consulting the supervisory authority pursuant to paragraph 1, the controller shall provide the supervisory authority with: (a)  where applicable, the respective responsibilities of the controller, joint controllers and processors involved in the processing, in particular for processing within a group of undertakings; (b)  the purposes and means of the intended processing; (c)  the measures and safeguards provided to protect the rights and freedoms of data subjects pursuant to this Regulation; (d)  where applicable, the contact details of the data protection officer; 4.5.2016 L 119/54   (e)  the data protection impact assessment provided for in Article 35; and (f)  any other information requested by the supervisory authority.\n4.Member States shall consult the supervisory authority during the preparation of a proposal for a legislative measure to be adopted by a national parliament, or of a regulatory measure based on such a legislative measure, which relates to processing.\n5.Notwithstanding paragraph 1, Member State law may require controllers to consult with, and obtain prior authorisation from, the supervisory authority in relation to processing by a controller for the performance of a task carried out by the controller in the public interest, including processing in relation to social protection and public health',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.3914
cosine_accuracy@3 0.4268
cosine_accuracy@5 0.452
cosine_accuracy@10 0.5202
cosine_precision@1 0.3914
cosine_precision@3 0.3754
cosine_precision@5 0.349
cosine_precision@10 0.3058
cosine_recall@1 0.0832
cosine_recall@3 0.2048
cosine_recall@5 0.27
cosine_recall@10 0.3782
cosine_ndcg@10 0.4456
cosine_mrr@10 0.4171
cosine_map@100 0.4895

Information Retrieval

Metric Value
cosine_accuracy@1 0.3889
cosine_accuracy@3 0.4268
cosine_accuracy@5 0.4495
cosine_accuracy@10 0.5051
cosine_precision@1 0.3889
cosine_precision@3 0.3746
cosine_precision@5 0.3505
cosine_precision@10 0.3033
cosine_recall@1 0.081
cosine_recall@3 0.2
cosine_recall@5 0.2687
cosine_recall@10 0.3701
cosine_ndcg@10 0.4412
cosine_mrr@10 0.4138
cosine_map@100 0.4871

Information Retrieval

Metric Value
cosine_accuracy@1 0.3813
cosine_accuracy@3 0.4167
cosine_accuracy@5 0.4343
cosine_accuracy@10 0.4949
cosine_precision@1 0.3813
cosine_precision@3 0.367
cosine_precision@5 0.3394
cosine_precision@10 0.2957
cosine_recall@1 0.079
cosine_recall@3 0.1982
cosine_recall@5 0.2594
cosine_recall@10 0.3582
cosine_ndcg@10 0.4305
cosine_mrr@10 0.4046
cosine_map@100 0.4765

Information Retrieval

Metric Value
cosine_accuracy@1 0.3889
cosine_accuracy@3 0.4066
cosine_accuracy@5 0.4268
cosine_accuracy@10 0.4899
cosine_precision@1 0.3889
cosine_precision@3 0.3712
cosine_precision@5 0.3429
cosine_precision@10 0.2987
cosine_recall@1 0.0763
cosine_recall@3 0.1924
cosine_recall@5 0.2556
cosine_recall@10 0.3572
cosine_ndcg@10 0.4299
cosine_mrr@10 0.4078
cosine_map@100 0.4727

Information Retrieval

Metric Value
cosine_accuracy@1 0.3561
cosine_accuracy@3 0.3889
cosine_accuracy@5 0.404
cosine_accuracy@10 0.4419
cosine_precision@1 0.3561
cosine_precision@3 0.3426
cosine_precision@5 0.3182
cosine_precision@10 0.272
cosine_recall@1 0.0732
cosine_recall@3 0.1824
cosine_recall@5 0.2429
cosine_recall@10 0.3266
cosine_ndcg@10 0.3954
cosine_mrr@10 0.375
cosine_map@100 0.4422

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 1,580 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 7 tokens
    • mean: 15.29 tokens
    • max: 34 tokens
    • min: 31 tokens
    • mean: 361.9 tokens
    • max: 512 tokens
  • Samples:
    anchor positive
    By when does each Member State need to notify the Commission of the provisions of its law adopted pursuant to this Chapter? 1.Each Member State shall provide for one or more independent public authorities to be responsible for monitoring the application of this Regulation, in order to protect the fundamental rights and freedoms of natural persons in relation to processing and to facilitate the free flow of personal data within the Union (‘supervisory authority’).
    2.Each supervisory authority shall contribute to the consistent application of this Regulation throughout the Union. For that purpose, the supervisory authorities shall cooperate with each other and the Commission in accordance with Chapter VII.
    3.Where more than one supervisory authority is established in a Member State, that Member State shall designate the supervisory authority which is to represent those authorities in the Board and shall set out the mechanism to ensure compliance by the other authorities with the rules relating to the consistency mechanism referred to in Article 63
    4.Each Member State shall notify to the Commission the provisi...
    How much was the defendant ordered to pay? Court (Civil/Criminal):
    Provisions:
    Time of commission of the act:
    Outcome (not guilty, guilty): ORDERS the defendant to pay the plaintiff the amount of two thousand four hundred thirty-four euros and eighty-three cents (€2,434.83) with legal interest from the service of the lawsuit.

    Reasoning: Law 4537/2018 introduces mandatory provisions in favor of users, as according to Article 103, payment service providers are prohibited from deviating from the provisions to the detriment of payment service users, unless the possibility of deviation is expressly provided, and they can decide to offer only more favorable terms to payment service users. Under this law and its provisions, providers are only liable when there are unusual and unforeseen circumstances beyond the control of the party invoking them, and whose consequences could not have been avoided despite efforts to the contrary. However, operational risks and security risks of the system do not constitute unusual and unforeseen circu...
    On what date did the judge grant the motion? 1.A transfer of personal data to a third country or an international organisation may take place where the Commission has decided that the third country, a territory or one or more specified sectors within that third country, or the international organisation in question ensures an adequate level of protection. Such a transfer shall not require any specific authorisation.
    2.When assessing the adequacy of the level of protection, the Commission shall, in particular, take account of the following elements: (a) the rule of law, respect for human rights and fundamental freedoms, relevant legislation, both general and sectoral, including concerning public security, defence, national security and criminal law and the access of public authorities to personal data, as well as the implementation of such legislation, data protection rules, professional rules and security measures, including rules for the onward transfer of personal data to another third country or international organisation whi...
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • gradient_accumulation_steps: 2
  • learning_rate: 2e-05
  • num_train_epochs: 15
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 2
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 15
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
0.1010 10 15.676 - - - - -
0.2020 20 15.319 - - - - -
0.3030 30 14.9757 - - - - -
0.4040 40 13.2445 - - - - -
0.5051 50 11.1148 - - - - -
0.6061 60 10.5683 - - - - -
0.7071 70 9.8032 - - - - -
0.8081 80 8.894 - - - - -
0.9091 90 8.8733 - - - - -
1.0 99 - 0.3214 0.3206 0.2969 0.2776 0.2518
1.0101 100 8.8753 - - - - -
1.1111 110 6.7814 - - - - -
1.2121 120 6.436 - - - - -
1.3131 130 6.02 - - - - -
1.4141 140 7.5173 - - - - -
1.5152 150 6.3509 - - - - -
1.6162 160 5.9486 - - - - -
1.7172 170 5.6732 - - - - -
1.8182 180 5.2878 - - - - -
1.9192 190 5.2841 - - - - -
2.0 198 - 0.3956 0.4048 0.3743 0.3492 0.2884
2.0202 200 5.3054 - - - - -
2.1212 210 2.8915 - - - - -
2.2222 220 4.0363 - - - - -
2.3232 230 4.0412 - - - - -
2.4242 240 4.0101 - - - - -
2.5253 250 3.8038 - - - - -
2.6263 260 3.5217 - - - - -
2.7273 270 3.143 - - - - -
2.8283 280 5.5051 - - - - -
2.9293 290 3.2826 - - - - -
3.0 297 - 0.4042 0.3981 0.3909 0.3646 0.3170
3.0303 300 3.0156 - - - - -
3.1313 310 2.2537 - - - - -
3.2323 320 3.3127 - - - - -
3.3333 330 2.5861 - - - - -
3.4343 340 1.7786 - - - - -
3.5354 350 2.5512 - - - - -
3.6364 360 2.0074 - - - - -
3.7374 370 2.4396 - - - - -
3.8384 380 2.6935 - - - - -
3.9394 390 1.8119 - - - - -
4.0101 397 - 0.4304 0.4282 0.4139 0.3951 0.3643
4.0303 400 2.3398 - - - - -
4.1313 410 1.6697 - - - - -
4.2323 420 1.3835 - - - - -
4.3333 430 1.7774 - - - - -
4.4343 440 1.6399 - - - - -
4.5354 450 1.7386 - - - - -
4.6364 460 2.3151 - - - - -
4.7374 470 1.9067 - - - - -
4.8384 480 1.9133 - - - - -
4.9394 490 2.2215 - - - - -
5.0 496 - 0.4255 0.4204 0.4210 0.4062 0.3682
5.0404 500 1.898 - - - - -
5.1414 510 1.396 - - - - -
5.2424 520 0.8949 - - - - -
5.3434 530 1.4482 - - - - -
5.4444 540 1.6391 - - - - -
5.5455 550 1.9564 - - - - -
5.6465 560 1.2331 - - - - -
5.7475 570 1.813 - - - - -
5.8485 580 1.4363 - - - - -
5.9495 590 1.3519 - - - - -
6.0 595 - 0.4254 0.4294 0.4212 0.4196 0.3934
6.0505 600 1.1575 - - - - -
6.1515 610 0.9375 - - - - -
6.2525 620 0.9556 - - - - -
6.3535 630 1.7873 - - - - -
6.4545 640 0.6363 - - - - -
6.5556 650 0.7925 - - - - -
6.6566 660 1.5787 - - - - -
6.7576 670 1.274 - - - - -
6.8586 680 1.3011 - - - - -
6.9596 690 0.7303 - - - - -
7.0 694 - 0.4317 0.4452 0.4301 0.4284 0.4019
7.0606 700 0.6973 - - - - -
7.1616 710 0.6512 - - - - -
7.2626 720 0.5386 - - - - -
7.3636 730 0.6079 - - - - -
7.4646 740 1.1747 - - - - -
7.5657 750 1.1719 - - - - -
7.6667 760 0.5889 - - - - -
7.7677 770 0.8939 - - - - -
7.8687 780 1.0032 - - - - -
7.9697 790 0.5862 - - - - -
8.0 793 - 0.4456 0.4412 0.4305 0.4299 0.3954
8.0707 800 0.8925 - - - - -
8.1717 810 1.2382 - - - - -
8.2727 820 0.6373 - - - - -
8.3737 830 0.9514 - - - - -
8.4747 840 0.4652 - - - - -
8.5758 850 0.9173 - - - - -
8.6768 860 1.0672 - - - - -
8.7778 870 0.4503 - - - - -
8.8788 880 0.5905 - - - - -
8.9798 890 0.7086 - - - - -
9.0 892 - 0.4299 0.4240 0.4279 0.4073 0.3947
9.0808 900 0.3295 - - - - -
9.1818 910 0.6795 - - - - -
9.2828 920 0.6485 - - - - -
9.3838 930 0.3027 - - - - -
9.4848 940 0.3273 - - - - -
9.5859 950 1.3033 - - - - -
9.6869 960 0.3657 - - - - -
9.7879 970 0.6145 - - - - -
9.8889 980 0.4529 - - - - -
9.9899 990 0.6022 - - - - -
10.0 991 - 0.4425 0.4340 0.4291 0.4181 0.4115
-1 -1 - 0.4456 0.4412 0.4305 0.4299 0.3954
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 4.1.0
  • Transformers: 4.51.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.8.1
  • Datasets: 4.0.0
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
3
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for IoannisKat1/bert-base-uncased-legal-matryoshka

Finetuned
(5502)
this model

Evaluation results