You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

ModernBERT Japan Legal - Fine-tuned Model

This model is a fine-tuned version of sbintuitions/modernbert-ja-130m on Japanese legal case data for research purposes.

Model Details

  • Base Model: ModernBERT Japan 130M
  • Training Data: 65,855 Japanese legal cases (1947-2024)
  • Task: Masked Language Modeling (MLM) for legal domain adaptation.
  • Domain: Japanese Legal Text.

How to Use

You can use our models directly with the transformers library v4.48.0 or higher:

pip install -U "transformers>=4.48.0"

Additionally, if your GPUs support Flash Attention 2, we recommend using our models with Flash Attention 2.

pip install flash-attn --no-build-isolation

Example Usage

import torch
from transformers import AutoModelForMaskedLM, AutoTokenizer, pipeline

model = AutoModelForMaskedLM.from_pretrained("nguyenthanhasia/modernbert-ja-legal", torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained("nguyenthanhasia/modernbert-ja-legal")
fill_mask = pipeline("fill-mask", model=model, tokenizer=tokenizer)

results = fill_mask("おはようございます、今日の天気は<mask>です。")

for result in results:
    print(result)
# {'score': 0.5078125, 'token': 16416, 'token_str': '晴れ', 'sequence': 'おはようございます、今日の天気は晴れです。'}
# {'score': 0.240234375, 'token': 28933, 'token_str': '曇り', 'sequence': 'おはようございます、今日の天気は曇りです。'}
# {'score': 0.078125, 'token': 92339, 'token_str': 'くもり', 'sequence': 'おはようございます、今日の天気はくもりです。'}
# {'score': 0.078125, 'token': 2988, 'token_str': '雨', 'sequence': 'おはようございます、今日の天気は雨です。'}
# {'score': 0.0223388671875, 'token': 52525, 'token_str': '快晴', 'sequence': 'おはようございます、今日の天気は快晴です。'}

Intended Use and Limitations

This model is intended for research purposes on Japanese legal texts. It can be used for experiments on domain adaptation and benchmarking legal NLP tasks.

Limitations:

  • Domain Specificity (Japanese legal text only).
  • Training Data Bias.
  • Research Use Only: Not for critical applications or legal advice.
  • Inherits Base Model Limitations.
Downloads last month
-
Safetensors
Model size
133M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support