--- license: cc-by-nc-4.0 language: - jv datasets: - JavaneseHonorifics/Unggah-Ungguh base_model: - w11wo/javanese-distilbert-small-imdb pipeline_tag: text-classification library_name: transformers --- # Unggah-Ungguh-Javanese-Distilbert-Classifier Unggah-Ungguh-Javanese-Distilbert-Classifier is part of the Unggah-Ungguh's model family, a classifier model for Javanese Honorific Classification task that was mentioned in "Do Language Models Understand Honorific Systems in Javanese?". Check out [our paper](https://arxiv.org/abs/2502.20864) for more information! ## Model description - **Model type**: A classifier model trained on a highly curated Unggah-Ungguh dataset that represent Javanese Honorific rules and systems. - **Language(s) NLP**: Javanese - **License:** CC-BY-NC 4.0 - **Finetuned from model:** w11wo/javanese-distilbert-small-imdb ## Model Sources - **Project Page:** https://javanesehonorifics.github.io/ - **Repository:** https://github.com/JavaneseHonorifics - **Paper:** https://arxiv.org/abs/2502.20864 ## Using the model ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch model_path = "JavaneseHonorifics/Unggah-Ungguh-Javanese-Distilbert-Classifier" tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoModelForSequenceClassification.from_pretrained(model_path) INPUT_TEXT = "Mbak Srini mangan pecel ajange pincuk" tokenized_input = tokenizer([INPUT_TEXT], return_tensors="pt", truncation=True, padding=True) with torch.no_grad(): outputs = model(**tokenized_input) y_pred = outputs.logits.argmax(-1) print("Predicted class:", y_pred.item()) ``` ## License and Use Unggah-Ungguh is licensed under the CC-BY-NC 4.0 ## Citation ```bibtex @article{farhansyah2025language, title={Do Language Models Understand Honorific Systems in Javanese?}, author={Farhansyah, Mohammad Rifqi and Darmawan, Iwan and Kusumawardhana, Adryan and Winata, Genta Indra and Aji, Alham Fikri and Wijaya, Derry Tanti}, journal={arXiv preprint arXiv:2502.20864}, year={2025} } ```