Indonesian Legal Document Analyzer

This model performs comprehensive analysis of Indonesian legal documents, specifically court decisions (putusan pengadilan). It combines rule-based extraction with spaCy NLP pipeline for robust information extraction.

Features

Case Type Classification: Identifies legal case types (Perdata, Pidana)
Party Information Extraction: Extracts names and addresses of legal parties (Terdakwa, Penggugat, Tergugat, etc.)
Evidence Categorization: Automatically categorizes evidence into types (Narkotika, Senjata, Dokumen/Uang, Elektronik, Kendaraan)
Law Reference Extraction: Identifies mentioned laws and regulations (UU, KUHP, etc.)
Case Summary Extraction: Extracts key case summaries and judge verdicts
Keyword Extraction: Extracts important legal terms using lemmatization

Model Architecture

Base Model: Indonesian spaCy model (id_core_news_sm)
Hybrid Approach: Combines regex patterns with Named Entity Recognition (NER)
Safety Features: Text length limits, graceful fallbacks, comprehensive error handling

Usage

from legal_analyzer import IndonesianLegalAnalyzer

# Initialize the analyzer
analyzer = IndonesianLegalAnalyzer()

# Analyze legal document text
legal_text = "PUTUSAN Nomor 123/Pdt.G/2023/PN.Jkt.Sel ..."
results = analyzer.analyze_text(legal_text)

print(results)

Installation

Install required dependencies:

pip install spacy pandas numpy

Install Indonesian spaCy model:

python -m spacy download id_core_news_sm
# OR manually download the wheel:
# pip install https://github.com/explosion/spacy-models/releases/download/id_core_news_sm-3.4.0/id_core_news_sm-3.4.0-py3-none-any.whl

Output Schema

The model returns structured data with the following fields:

Filename: Source document identifier
Judul Kasus (Nomor Putusan): Case title and number
Jenis Kasus: Case type (Perdata/Pidana)
Label Pihak 1/2: Party labels (Terdakwa, Penggugat, etc.)
Nama Pihak 1/2: Party names
Alamat Pihak 1/2: Party addresses
Kata Kunci Utama: Key terms extracted
Kategori Barang Bukti: Evidence category
Kesimpulan Duduk Perkara: Case summary
Undang-Undang yang Disebutkan: Referenced laws
Barang Bukti: Evidence items
Putusan Hakim: Judge's verdict

Performance

Text Processing: Handles documents up to 100,000 characters
Memory Management: Automatic text truncation prevents memory issues
Error Recovery: Fallback mechanisms for robust processing
Processing Speed: ~0.05 second delay per document

Limitations

Designed specifically for Indonesian legal documents
Requires id_core_news_sm spaCy model
Optimized for court decision (putusan) format
May require manual tuning for other legal document types

Citation

If you use this model in your research, please cite:

@misc{indonesian-legal-analyzer-2024,
  title={Indonesian Legal Document Analyzer},
  author={NaBurju},
  year={2024},
  url={https://huggingface.co/nathangalung/indonesian-legal-analyzer}
}

License

This model is licensed under Apache 2.0. See LICENSE file for details.