PhoBERT Fine-tuned for Vietnamese Legal QA

Model Description

This model is a fine-tuned version of vinai/phobert-base for Vietnamese legal question answering.

Training Details

Training Data

Dataset: Custom Vietnamese Legal QA dataset
Total QA pairs: 156349
Training samples: 96472
Validation samples: 17025
Categories: Công nghiệp, Thuế, phí, lệ phí, các khoản thu khác, Đất đai, Dân số, gia đình, trẻ em, bình đẳng giới, Quốc phòng, Hành chính tư pháp, Tài nguyên, Văn hóa, thể thao, du lịch, Giao thông, vận tải, Thông tin, báo chí, xuất bản, Tổ chức chính trị - xã hội, hội, Y tế, dược, Dân tộc, Thống kê, Khoa học, công nghệ, An ninh quốc gia, Tổ chức bộ máy nhà nước, Ngoại giao, điều ước quốc tế, Bổ trợ tư pháp, Tài sản công, nợ công, dự trữ nhà nước, Tố tụng và các phương thức giải quyết tranh chấp, Doanh nghiệp, hợp tác xã, Trật tự, an toàn xã hội

Training Configuration

Base model: vinai/phobert-base
Learning rate: 2e-05
Training epochs: 3
Batch size: 4
Max sequence length: 256

Training Results

Training Loss: 0.6344684727986654
Validation F1: 0.602910749664121
Validation Accuracy: 0.9795007342143907

Usage

from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import torch

tokenizer = AutoTokenizer.from_pretrained("huynguyen251/phobert-legal-qa-v2")
model = AutoModelForQuestionAnswering.from_pretrained("huynguyen251/phobert-legal-qa-v2")

question = "Quy định này áp dụng cho ai?"
context = "Thanh niên là công dân Việt Nam từ đủ 16 tuổi đến 30 tuổi."

inputs = tokenizer(question, context, return_tensors="pt", max_length=512, truncation=True)
with torch.no_grad():
    outputs = model(**inputs)

start_idx = torch.argmax(outputs.start_logits)
end_idx = torch.argmax(outputs.end_logits)
answer = tokenizer.decode(inputs["input_ids"][0][start_idx:end_idx+1])
print(f"Answer: {answer}")

Limitations

This model is trained on Vietnamese legal documents and may not generalize to other domains or languages.

Training Framework

Framework: Transformers 4.44.2
Language: Vietnamese
License: Apache 2.0

huynguyen251
/

phobert-legal-qa-v2