You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

ModernBERT Japan Legal - Fine-tuned Model

This model is a fine-tuned version of sbintuitions/modernbert-ja-130m on Japanese legal case data for research purposes.

Model Details

Base Model: ModernBERT Japan 130M
Training Data: 65,855 Japanese legal cases (1947-2024)
Task: Masked Language Modeling (MLM) for legal domain adaptation.
Domain: Japanese Legal Text.

How to Use

You can use our models directly with the transformers library v4.48.0 or higher:

pip install -U "transformers>=4.48.0"

Additionally, if your GPUs support Flash Attention 2, we recommend using our models with Flash Attention 2.

pip install flash-attn --no-build-isolation

Example Usage

import torch
from transformers import AutoModelForMaskedLM, AutoTokenizer, pipeline

model = AutoModelForMaskedLM.from_pretrained("nguyenthanhasia/modernbert-ja-legal", torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained("nguyenthanhasia/modernbert-ja-legal")
fill_mask = pipeline("fill-mask", model=model, tokenizer=tokenizer)

results = fill_mask("おはようございます、今日の天気は<mask>です。")

for result in results:
    print(result)
# {'score': 0.5078125, 'token': 16416, 'token_str': '晴れ', 'sequence': 'おはようございます、今日の天気は晴れです。'}
# {'score': 0.240234375, 'token': 28933, 'token_str': '曇り', 'sequence': 'おはようございます、今日の天気は曇りです。'}
# {'score': 0.078125, 'token': 92339, 'token_str': 'くもり', 'sequence': 'おはようございます、今日の天気はくもりです。'}
# {'score': 0.078125, 'token': 2988, 'token_str': '雨', 'sequence': 'おはようございます、今日の天気は雨です。'}
# {'score': 0.0223388671875, 'token': 52525, 'token_str': '快晴', 'sequence': 'おはようございます、今日の天気は快晴です。'}

Intended Use and Limitations

This model is intended for research purposes on Japanese legal texts. It can be used for experiments on domain adaptation and benchmarking legal NLP tasks.

Limitations:

Domain Specificity (Japanese legal text only).
Training Data Bias.
Research Use Only: Not for critical applications or legal advice.
Inherits Base Model Limitations.

Downloads last month: -

Safetensors

Model size

133M params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support