---
license: gpl-2.0
language: id
tags:
- spacy
- ner
- token-classification
- indonesian
library_name: spacy
---
# Indonesian NER spaCy Model
This model is a Named Entity Recognition (NER) model for Indonesian language built with spaCy.
## Model Details
- **Language**: Indonesian (`id`)
- **Pipeline**: `ner`
- **spaCy Version**: `>=3.8.7,<3.9.0`
- **Model Architecture**: Transition-based parser with HashEmbedCNN tok2vec
## Supported Entity Types
The model recognizes the following entity types:
- `CARDINAL` - Cardinal numbers
- `DATE` - Date expressions
- `EVENT` - Events
- `FACILITY` - Facilities
- `GPE` - Geopolitical entities
- `LANGUAGE` - Languages
- `LAW` - Legal documents
- `LOCATION` - Locations
- `MISC` - Miscellaneous
- `MONEY` - Monetary values
- `NORP` - Nationalities or religious/political groups
- `ORDINAL` - Ordinal numbers
- `ORGANIZATION` - Organizations
- `PERCENT` - Percentages
- `PERSON` - People
- `PRODUCT` - Products
- `QUANTITY` - Quantities
- `TIME` - Time expressions
- `TITLE` - Titles
## Usage
```python
import spacy
# Load the model
nlp = spacy.load("asmud/ner-spacy-indonesian")
# Process text
doc = nlp("Presiden Joko Widodo mengunjungi Jakarta pada tanggal 17 Agustus 2024.")
# Extract entities
for ent in doc.ents:
print(f"{ent.text} -> {ent.label_}")
```
## Installation
```bash
pip install https://huggingface.co/asmud/ner-spacy-indonesian/resolve/main/ner-spacy-indonesian-any-py3-none-any.whl
```
Or use with spaCy:
```python
import spacy
nlp = spacy.load("asmud/ner-spacy-indonesian")
```
## Model Architecture
- **tok2vec**: HashEmbedCNN with 96-dimensional embeddings, depth 4, embed size 2000
- **ner**: Transition-based parser with 64 hidden units, maxout pieces 2
- **Training**: 100 iterations with dropout 0.5, compounding batch sizes (4-32)
- **Optimizer**: Adam (lr=0.001, L2=0.01, grad_clip=1.0)
## Training Configuration
### Training Data Format
The model was trained on data with custom XML-like tags:
```
Presiden Joko Widodo mengunjungi Jakarta pada 17 Agustus 2024.
```
### Training Parameters
- **Iterations**: 100 training iterations
- **Dropout**: 0.5 during training
- **Batch Size**: Compounding from 4 to 32 examples
- **Text Preprocessing**: Lowercased input text
- **Data Shuffling**: Random shuffling each iteration
### Architecture Details
- **Embedding Width**: 96 dimensions
- **Hidden Width**: 64 units
- **Embed Size**: 2000 features
- **Window Size**: 1
- **Maxout Pieces**: 3 (tok2vec), 2 (parser)
- **Subword Features**: Enabled
## Model Evaluation
### Performance Metrics
The model was evaluated on 2,987 examples from the training data with the following results:
#### Overall Performance
- **Precision**: 0.9846
- **Recall**: 0.9865
- **F1-score**: 0.9856
#### Per-Entity Performance
| Entity | Precision | Recall | F1-score |
|--------|-----------|--------|----------|
| PRODUCT | 1.0000 | 1.0000 | 1.0000 |
| LOCATION | 1.0000 | 1.0000 | 1.0000 |
| LANGUAGE | 1.0000 | 1.0000 | 1.0000 |
| EVENT | 0.9962 | 1.0000 | 0.9981 |
| MISC | 0.9973 | 0.9960 | 0.9966 |
| FACILITY | 0.9923 | 1.0000 | 0.9961 |
| LAW | 1.0000 | 0.9919 | 0.9959 |
| TITLE | 0.9947 | 0.9947 | 0.9947 |
| GPE | 1.0000 | 0.9886 | 0.9943 |
| NORP | 0.9872 | 1.0000 | 0.9935 |
| PERSON | 0.9935 | 0.9935 | 0.9935 |
| DATE | 0.9926 | 0.9830 | 0.9878 |
| ORDINAL | 0.9750 | 1.0000 | 0.9873 |
| MONEY | 0.9683 | 0.9946 | 0.9812 |
| ORGANIZATION | 0.9457 | 0.9905 | 0.9676 |
| TIME | 0.9476 | 0.9819 | 0.9645 |
| QUANTITY | 0.9874 | 0.9291 | 0.9574 |
| PERCENT | 0.8600 | 1.0000 | 0.9247 |
| CARDINAL | 0.9620 | 0.8736 | 0.9157 |
### Evaluation Features
You can reproduce these metrics using the included analyzer script:
```bash
# Install required dependencies
pip install streamlit pandas
# Run the analyzer
streamlit run spacy_model_analyzer.py
```
The analyzer provides:
- **Interactive Analysis**: Real-time entity recognition testing
- **Detailed Metrics**: Precision, recall, and F1-score calculations
- **Text Alignment**: Automatic handling of entity boundary alignment
- **Visualization**: Entity highlighting and analysis tools