Text Classification
Keras
Javanese

Unggah-Ungguh-Javanese-LSTM-Classifier

Unggah-Ungguh-Javanese-LSTM-Classifier is a CNN-BiLSTM model for Javanese honorific level classification. This model is part of the Unggah-Ungguh project and serves as a strong non-transformer baseline for the task introduced in the paper "Do Language Models Understand Honorific Systems in Javanese?".

Model description

  • Model type: Convolutional + Bidirectional LSTM classifier
  • Language: Javanese
  • License: CC-BY-NC 4.0
  • Framework: Keras (TensorFlow backend)
  • Training: Trained on a curated dataset of Javanese sentences annotated with honorific labels

Model Sources

Using the model

from Baseline_LSTM import Config, build_model
import tensorflow as tf
import json
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Load tokenizer
with open("tokenizer.json", "r") as f:
    tokenizer_data = json.load(f)

# Build reverse word-index if needed
word_index = tokenizer_data.get("word_index", tokenizer_data)
config = Config()

# Tokenize example sentence
text = "Mbak Srini mangan pecel ajange pincuk"
tokens = [word_index.get(word, word_index.get("<unk>", 1)) for word in text.split()]
tokens_padded = pad_sequences([tokens], maxlen=config.MAX_LEN, padding='post')

# Load model
model = build_model(config)
model.load_weights("baseline_lstm_model.h5")

# Predict
prediction = model.predict(tokens_padded)
label = prediction.argmax(axis=1)[0]
print("Predicted class:", label)

License and Use

Unggah-Ungguh is licensed under the CC-BY-NC 4.0

Citation

@article{farhansyah2025language,
  title={Do Language Models Understand Honorific Systems in Javanese?},
  author={Farhansyah, Mohammad Rifqi and Darmawan, Iwan and Kusumawardhana, Adryan and Winata, Genta Indra and Aji, Alham Fikri and Wijaya, Derry Tanti},
  journal={arXiv preprint arXiv:2502.20864},
  year={2025}
}
Downloads last month
0
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train JavaneseHonorifics/Unggah-Ungguh-Javanese-LSTM-Classifier