ModernBERT Embed base Legal Matryoshka

This is a sentence-transformers model finetuned from nlpaueb/legal-bert-base-uncased on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: nlpaueb/legal-bert-base-uncased
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("IoannisKat1/legal-bert-base-uncased-legal-matryoshka")
# Run inference
sentences = [
    'What should be established by law?',
    'Where a Member State establishes several supervisory authorities, it should establish by law mechanisms for ensuring the effective participation of those supervisory authorities in the consistency mechanism. That Member State should in particular designate the supervisory authority which functions as a single contact point for the effective participation of those authorities in the mechanism, to ensure swift and smooth cooperation with other supervisory authorities, the Board and the Commission.',
    "Any person who intentionally produces, distributes, publishes, imports or exports, transfers, offers, sells or in other way distributes, supplies with, purchases, obtains, acquires or owns child pornographic material or spreads or broadcasts information concerning executions of such actions, is sentenced to at least one year’s imprisonment and a fine of ten to one hundred thousand Euros.\nAny person who intentionally produces, offers, sells or in any way distributes, transfers, purchases, obtains or acquires child pornographic material or broadcasts information concerning the executions of such actions through a computer system or through the Internet is sentenced to at least two years’ imprisonment and a fine of fifty to three hundred thousand Euros.\nPornographic material in the sense of the above mentioned paragraphs consists of any representation or an actual or virtual depiction, in electronic or any other form of material, of the body of or part of the body of a minor, aimed at causing sexual stimulation, as well as a recording or depiction of an actual or virtual carnal act that arises sexual stimulation by or with a minor.\nActions of the first and second paragraph are punishable by imprisonment of up to ten years and a fine of fifty to one hundred thousand Euros if: are professionally or habitually committed; the production of child pornographic material is connected to the exploiting of the need, mental or intellectual weakness or corporal dysfunction of the minor due to organic disease or by exercise or threat of violence or using a minor under the age of fifteen.\nIf such an act as described in case b) resulted in grievous bodily harm to the victim, it will entail a sentence of at least ten years' imprisonment and a fine of one hundred thousand to five hundred thousand Euros. If, however, such an act resulted in the victim’s death, then life imprisonment is imposed.\n",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.4293
cosine_accuracy@3 0.4773
cosine_accuracy@5 0.5101
cosine_accuracy@10 0.5581
cosine_precision@1 0.4293
cosine_precision@3 0.4259
cosine_precision@5 0.4051
cosine_precision@10 0.3664
cosine_recall@1 0.0752
cosine_recall@3 0.2024
cosine_recall@5 0.2746
cosine_recall@10 0.3948
cosine_ndcg@10 0.4898
cosine_mrr@10 0.4592
cosine_map@100 0.5444

Information Retrieval

Metric Value
cosine_accuracy@1 0.4318
cosine_accuracy@3 0.4722
cosine_accuracy@5 0.5025
cosine_accuracy@10 0.5505
cosine_precision@1 0.4318
cosine_precision@3 0.4276
cosine_precision@5 0.404
cosine_precision@10 0.3664
cosine_recall@1 0.0736
cosine_recall@3 0.2003
cosine_recall@5 0.2711
cosine_recall@10 0.3951
cosine_ndcg@10 0.4888
cosine_mrr@10 0.4594
cosine_map@100 0.5398

Information Retrieval

Metric Value
cosine_accuracy@1 0.4419
cosine_accuracy@3 0.4823
cosine_accuracy@5 0.5177
cosine_accuracy@10 0.5581
cosine_precision@1 0.4419
cosine_precision@3 0.4335
cosine_precision@5 0.4136
cosine_precision@10 0.376
cosine_recall@1 0.0769
cosine_recall@3 0.2004
cosine_recall@5 0.2718
cosine_recall@10 0.3917
cosine_ndcg@10 0.4978
cosine_mrr@10 0.4682
cosine_map@100 0.5495

Information Retrieval

Metric Value
cosine_accuracy@1 0.4394
cosine_accuracy@3 0.4747
cosine_accuracy@5 0.5101
cosine_accuracy@10 0.5581
cosine_precision@1 0.4394
cosine_precision@3 0.4293
cosine_precision@5 0.4116
cosine_precision@10 0.3803
cosine_recall@1 0.0736
cosine_recall@3 0.191
cosine_recall@5 0.2619
cosine_recall@10 0.3906
cosine_ndcg@10 0.4968
cosine_mrr@10 0.4652
cosine_map@100 0.5413

Information Retrieval

Metric Value
cosine_accuracy@1 0.4091
cosine_accuracy@3 0.4571
cosine_accuracy@5 0.4848
cosine_accuracy@10 0.5303
cosine_precision@1 0.4091
cosine_precision@3 0.4074
cosine_precision@5 0.3919
cosine_precision@10 0.3581
cosine_recall@1 0.0663
cosine_recall@3 0.1797
cosine_recall@5 0.2507
cosine_recall@10 0.3628
cosine_ndcg@10 0.4667
cosine_mrr@10 0.4371
cosine_map@100 0.5119

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 1,580 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 7 tokens
    • mean: 15.34 tokens
    • max: 36 tokens
    • min: 25 tokens
    • mean: 354.97 tokens
    • max: 512 tokens
  • Samples:
    anchor positive
    What are the consequences for unlawful interference with sensitive data? Failure to notify the Authority of file establishment or permit changes is punished by up to three years’ imprisonment and a fine of one to five million Drachmas.
    Maintaining a file without a permit or violating permit terms is punished by at least one year’s imprisonment and a fine of one to five million Drachmas.
    Unauthorized file interconnection or without permit is punished by up to three years’ imprisonment and a fine of one to five million Drachmas.
    Unlawful interference with personal data is punished by imprisonment and a fine; for sensitive data, at least one year’s imprisonment and a fine of one to ten million Drachmas.
    Controllers who fail to comply with Authority decisions or violate data transfer rules face at least two years’ imprisonment and a fine of one to five million Drachmas.
    If acts were committed for unlawful benefit or to cause harm, punishment is up to ten years’ imprisonment and a fine of two to ten million Drachmas.
    If acts jeopardize democratic governance or n...
    What purposes could justify the controller being a private entity? Where processing is carried out in accordance with a legal obligation to which the controller is subject or where processing is necessary for the performance of a task carried out in the public interest or in the exercise of official authority, the processing should have a basis in Union or Member State law. This Regulation does not require a specific law for each individual processing. A law as a basis for several processing operations based on a legal obligation to which the controller is subject or where processing is necessary for the performance of a task carried out in the public interest or in the exercise of an official authority may be sufficient. It should also be for Union or Member State law to determine the purpose of processing. Furthermore, that law could specify the general conditions of this Regulation governing the lawfulness of personal data processing, establish specifications for determining the controller, the type of personal data which are subject to the process...
    What conditions need to be fulfilled by the independent supervisory authority overseeing churches and religious associations? 1.Where in a Member State, churches and religious associations or communities apply, at the time of entry into force of this Regulation, comprehensive rules relating to the protection of natural persons with regard to processing, such rules may continue to apply, provided that they are brought into line with this Regulation.
    2.Churches and religious associations which apply comprehensive rules in accordance with paragraph 1 of this Article shall be subject to the supervision of an independent supervisory authority, which may be specific, provided that it fulfils the conditions laid down in Chapter VI of this Regulation. CHAPTER X Delegated acts and implementing acts
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • gradient_accumulation_steps: 2
  • learning_rate: 2e-05
  • num_train_epochs: 15
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 2
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 15
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
0.1010 10 15.9718 - - - - -
0.2020 20 16.7853 - - - - -
0.3030 30 14.697 - - - - -
0.4040 40 13.9906 - - - - -
0.5051 50 14.0258 - - - - -
0.6061 60 12.5485 - - - - -
0.7071 70 11.0342 - - - - -
0.8081 80 10.2744 - - - - -
0.9091 90 8.8141 - - - - -
1.0 99 - 0.3632 0.3783 0.3431 0.3077 0.2759
1.0101 100 9.9451 - - - - -
1.1111 110 7.4968 - - - - -
1.2121 120 7.1261 - - - - -
1.3131 130 6.0039 - - - - -
1.4141 140 6.5453 - - - - -
1.5152 150 5.9298 - - - - -
1.6162 160 7.4043 - - - - -
1.7172 170 6.3976 - - - - -
1.8182 180 8.9042 - - - - -
1.9192 190 5.2542 - - - - -
2.0 198 - 0.3951 0.3888 0.3774 0.3611 0.3264
2.0202 200 5.0081 - - - - -
2.1212 210 5.7284 - - - - -
2.2222 220 4.2062 - - - - -
2.3232 230 3.9454 - - - - -
2.4242 240 3.5888 - - - - -
2.5253 250 3.7057 - - - - -
2.6263 260 3.4574 - - - - -
2.7273 270 4.1998 - - - - -
2.8283 280 4.3571 - - - - -
2.9293 290 3.0049 - - - - -
3.0 297 - 0.4155 0.4083 0.4056 0.4043 0.3717
3.0303 300 4.0507 - - - - -
3.1313 310 2.4514 - - - - -
3.2323 320 3.6131 - - - - -
3.3333 330 2.6191 - - - - -
3.4343 340 2.4375 - - - - -
3.5354 350 1.7928 - - - - -
3.6364 360 2.4522 - - - - -
3.7374 370 2.4557 - - - - -
3.8384 380 2.8036 - - - - -
3.9394 390 2.694 - - - - -
4.0 396 - 0.4491 0.4509 0.4484 0.4204 0.3830
4.0404 400 2.3715 - - - - -
4.1414 410 1.5032 - - - - -
4.2424 420 1.711 - - - - -
4.3434 430 1.7695 - - - - -
4.4444 440 2.2982 - - - - -
4.5455 450 1.6361 - - - - -
4.6465 460 2.3351 - - - - -
4.7475 470 1.6405 - - - - -
4.8485 480 1.0239 - - - - -
4.9495 490 1.6597 - - - - -
5.0 495 - 0.4354 0.4431 0.4320 0.4195 0.3964
5.0505 500 1.3434 - - - - -
5.1515 510 1.3611 - - - - -
5.2525 520 1.2637 - - - - -
5.3535 530 1.4342 - - - - -
5.4545 540 1.3777 - - - - -
5.5556 550 1.2341 - - - - -
5.6566 560 1.2177 - - - - -
5.7576 570 1.814 - - - - -
5.8586 580 1.7181 - - - - -
5.9596 590 1.2835 - - - - -
6.0 594 - 0.4588 0.4591 0.4743 0.4688 0.4174
6.0606 600 1.0944 - - - - -
6.1616 610 1.3022 - - - - -
6.2626 620 1.3066 - - - - -
6.3636 630 1.1161 - - - - -
6.4646 640 1.3089 - - - - -
6.5657 650 1.2599 - - - - -
6.6667 660 1.0028 - - - - -
6.7677 670 0.887 - - - - -
6.8687 680 1.0754 - - - - -
6.9697 690 1.2784 - - - - -
7.0 693 - 0.4627 0.4655 0.4676 0.4554 0.4359
7.0707 700 0.8864 - - - - -
7.1717 710 1.057 - - - - -
7.2727 720 1.3416 - - - - -
7.3737 730 0.5645 - - - - -
7.4747 740 0.6572 - - - - -
7.5758 750 1.0231 - - - - -
7.6768 760 0.7654 - - - - -
7.7778 770 0.8611 - - - - -
7.8788 780 1.3308 - - - - -
7.9798 790 0.6435 - - - - -
8.0 792 - 0.4793 0.4818 0.4767 0.4812 0.4439
8.0808 800 0.7799 - - - - -
8.1818 810 0.6171 - - - - -
8.2828 820 0.9222 - - - - -
8.3838 830 0.6862 - - - - -
8.4848 840 0.3412 - - - - -
8.5859 850 0.6021 - - - - -
8.6869 860 0.9747 - - - - -
8.7879 870 0.7557 - - - - -
8.8889 880 1.1181 - - - - -
8.9899 890 0.6717 - - - - -
9.0 891 - 0.4937 0.4823 0.4963 0.4796 0.4346
9.0909 900 0.4619 - - - - -
9.1919 910 0.5895 - - - - -
9.2929 920 0.618 - - - - -
9.3939 930 0.8326 - - - - -
9.4949 940 0.5188 - - - - -
9.5960 950 0.8664 - - - - -
9.6970 960 0.4766 - - - - -
9.7980 970 0.4169 - - - - -
9.8990 980 0.6648 - - - - -
10.0 990 0.7753 0.4764 0.4750 0.4837 0.4861 0.4444
10.1010 1000 0.347 - - - - -
10.2020 1010 0.1793 - - - - -
10.3030 1020 0.3656 - - - - -
10.4040 1030 0.7847 - - - - -
10.5051 1040 0.6572 - - - - -
10.6061 1050 0.4218 - - - - -
10.7071 1060 0.695 - - - - -
10.8081 1070 0.3104 - - - - -
10.9091 1080 1.0731 - - - - -
11.0 1089 - 0.4848 0.4940 0.4947 0.4858 0.4527
11.0101 1090 0.205 - - - - -
11.1111 1100 0.4321 - - - - -
11.2121 1110 0.3332 - - - - -
11.3131 1120 0.3153 - - - - -
11.4141 1130 0.2791 - - - - -
11.5152 1140 0.358 - - - - -
11.6162 1150 0.3905 - - - - -
11.7172 1160 0.257 - - - - -
11.8182 1170 0.2831 - - - - -
11.9192 1180 0.9309 - - - - -
12.0 1188 - 0.4918 0.4870 0.4975 0.4961 0.4674
12.0202 1190 0.5713 - - - - -
12.1212 1200 0.707 - - - - -
12.2222 1210 0.7112 - - - - -
12.3232 1220 0.6857 - - - - -
12.4242 1230 0.6515 - - - - -
12.5253 1240 0.5293 - - - - -
12.6263 1250 0.1141 - - - - -
12.7273 1260 0.2988 - - - - -
12.8283 1270 0.2778 - - - - -
12.9293 1280 0.3073 - - - - -
13.0 1287 - 0.4836 0.4824 0.4969 0.4863 0.4675
13.0303 1290 0.1673 - - - - -
13.1313 1300 0.2177 - - - - -
13.2323 1310 0.4206 - - - - -
13.3333 1320 0.4412 - - - - -
13.4343 1330 0.3181 - - - - -
13.5354 1340 0.2666 - - - - -
13.6364 1350 0.7927 - - - - -
13.7374 1360 0.2329 - - - - -
13.8384 1370 0.2652 - - - - -
13.9394 1380 0.4054 - - - - -
14.0 1386 - 0.4898 0.4888 0.4978 0.4968 0.4667
14.0404 1390 0.6259 - - - - -
14.1414 1400 0.4173 - - - - -
14.2424 1410 0.5599 - - - - -
14.3434 1420 0.434 - - - - -
14.4444 1430 0.3381 - - - - -
14.5455 1440 0.6903 - - - - -
14.6465 1450 0.3789 - - - - -
14.7475 1460 0.2936 - - - - -
14.8485 1470 0.2499 - - - - -
14.9495 1480 0.188 - - - - -
15.0 1485 - 0.4900 0.4888 0.4991 0.4968 0.4661
-1 -1 - 0.4898 0.4888 0.4978 0.4968 0.4667
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 4.1.0
  • Transformers: 4.51.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.8.1
  • Datasets: 4.0.0
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
45
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for IoannisKat1/legal-bert-base-uncased-legal-matryoshka

Finetuned
(66)
this model

Evaluation results