Indonesian Legal Document Analyzer
This model performs comprehensive analysis of Indonesian legal documents, specifically court decisions (putusan pengadilan). It combines rule-based extraction with spaCy NLP pipeline for robust information extraction.
Features
- Case Type Classification: Identifies legal case types (Perdata, Pidana)
- Party Information Extraction: Extracts names and addresses of legal parties (Terdakwa, Penggugat, Tergugat, etc.)
- Evidence Categorization: Automatically categorizes evidence into types (Narkotika, Senjata, Dokumen/Uang, Elektronik, Kendaraan)
- Law Reference Extraction: Identifies mentioned laws and regulations (UU, KUHP, etc.)
- Case Summary Extraction: Extracts key case summaries and judge verdicts
- Keyword Extraction: Extracts important legal terms using lemmatization
Model Architecture
- Base Model: Indonesian spaCy model (
id_core_news_sm
) - Hybrid Approach: Combines regex patterns with Named Entity Recognition (NER)
- Safety Features: Text length limits, graceful fallbacks, comprehensive error handling
Usage
from legal_analyzer import IndonesianLegalAnalyzer
# Initialize the analyzer
analyzer = IndonesianLegalAnalyzer()
# Analyze legal document text
legal_text = "PUTUSAN Nomor 123/Pdt.G/2023/PN.Jkt.Sel ..."
results = analyzer.analyze_text(legal_text)
print(results)
Installation
- Install required dependencies:
pip install spacy pandas numpy
- Install Indonesian spaCy model:
python -m spacy download id_core_news_sm
# OR manually download the wheel:
# pip install https://github.com/explosion/spacy-models/releases/download/id_core_news_sm-3.4.0/id_core_news_sm-3.4.0-py3-none-any.whl
Output Schema
The model returns structured data with the following fields:
Filename
: Source document identifierJudul Kasus (Nomor Putusan)
: Case title and numberJenis Kasus
: Case type (Perdata/Pidana)Label Pihak 1/2
: Party labels (Terdakwa, Penggugat, etc.)Nama Pihak 1/2
: Party namesAlamat Pihak 1/2
: Party addressesKata Kunci Utama
: Key terms extractedKategori Barang Bukti
: Evidence categoryKesimpulan Duduk Perkara
: Case summaryUndang-Undang yang Disebutkan
: Referenced lawsBarang Bukti
: Evidence itemsPutusan Hakim
: Judge's verdict
Performance
- Text Processing: Handles documents up to 100,000 characters
- Memory Management: Automatic text truncation prevents memory issues
- Error Recovery: Fallback mechanisms for robust processing
- Processing Speed: ~0.05 second delay per document
Limitations
- Designed specifically for Indonesian legal documents
- Requires
id_core_news_sm
spaCy model - Optimized for court decision (putusan) format
- May require manual tuning for other legal document types
Citation
If you use this model in your research, please cite:
@misc{indonesian-legal-analyzer-2024,
title={Indonesian Legal Document Analyzer},
author={NaBurju},
year={2024},
url={https://huggingface.co/nathangalung/indonesian-legal-analyzer}
}
License
This model is licensed under Apache 2.0. See LICENSE file for details.
- Downloads last month
- -