Indonesian Legal Document Analyzer

This model performs comprehensive analysis of Indonesian legal documents, specifically court decisions (putusan pengadilan). It combines rule-based extraction with spaCy NLP pipeline for robust information extraction.

Features

  • Case Type Classification: Identifies legal case types (Perdata, Pidana)
  • Party Information Extraction: Extracts names and addresses of legal parties (Terdakwa, Penggugat, Tergugat, etc.)
  • Evidence Categorization: Automatically categorizes evidence into types (Narkotika, Senjata, Dokumen/Uang, Elektronik, Kendaraan)
  • Law Reference Extraction: Identifies mentioned laws and regulations (UU, KUHP, etc.)
  • Case Summary Extraction: Extracts key case summaries and judge verdicts
  • Keyword Extraction: Extracts important legal terms using lemmatization

Model Architecture

  • Base Model: Indonesian spaCy model (id_core_news_sm)
  • Hybrid Approach: Combines regex patterns with Named Entity Recognition (NER)
  • Safety Features: Text length limits, graceful fallbacks, comprehensive error handling

Usage

from legal_analyzer import IndonesianLegalAnalyzer

# Initialize the analyzer
analyzer = IndonesianLegalAnalyzer()

# Analyze legal document text
legal_text = "PUTUSAN Nomor 123/Pdt.G/2023/PN.Jkt.Sel ..."
results = analyzer.analyze_text(legal_text)

print(results)

Installation

  1. Install required dependencies:
pip install spacy pandas numpy
  1. Install Indonesian spaCy model:
python -m spacy download id_core_news_sm
# OR manually download the wheel:
# pip install https://github.com/explosion/spacy-models/releases/download/id_core_news_sm-3.4.0/id_core_news_sm-3.4.0-py3-none-any.whl

Output Schema

The model returns structured data with the following fields:

  • Filename: Source document identifier
  • Judul Kasus (Nomor Putusan): Case title and number
  • Jenis Kasus: Case type (Perdata/Pidana)
  • Label Pihak 1/2: Party labels (Terdakwa, Penggugat, etc.)
  • Nama Pihak 1/2: Party names
  • Alamat Pihak 1/2: Party addresses
  • Kata Kunci Utama: Key terms extracted
  • Kategori Barang Bukti: Evidence category
  • Kesimpulan Duduk Perkara: Case summary
  • Undang-Undang yang Disebutkan: Referenced laws
  • Barang Bukti: Evidence items
  • Putusan Hakim: Judge's verdict

Performance

  • Text Processing: Handles documents up to 100,000 characters
  • Memory Management: Automatic text truncation prevents memory issues
  • Error Recovery: Fallback mechanisms for robust processing
  • Processing Speed: ~0.05 second delay per document

Limitations

  • Designed specifically for Indonesian legal documents
  • Requires id_core_news_sm spaCy model
  • Optimized for court decision (putusan) format
  • May require manual tuning for other legal document types

Citation

If you use this model in your research, please cite:

@misc{indonesian-legal-analyzer-2024,
  title={Indonesian Legal Document Analyzer},
  author={NaBurju},
  year={2024},
  url={https://huggingface.co/nathangalung/indonesian-legal-analyzer}
}

License

This model is licensed under Apache 2.0. See LICENSE file for details.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support