metadata
license: apache-2.0
datasets:
- Arailym-aitu/small_kazakh_corpus
language:
- kk
metrics:
- accuracy
- f1
base_model:
- nur-dev/roberta-kaz-large
new_version: nur-dev/roberta-kaz-large
pipeline_tag: text-classification
Model Card for Model ID
This model is designed for text classification tasks in the Kazakh language, based on the RoBERTa architecture and fine-tuned using the Small Kazakh Corpus dataset.
Model Details
Model Description
The model aims to enhance natural language processing (NLP) capabilities for the Kazakh language, particularly in text classification tasks.
- Developed by: Tleubayeva Arailym, Tabuldin Aisultan, Aubakirov Sultan
- Model type: Transformer-based (RoBERTa)
- Language(s) (NLP): Kazakh (kk)
- License: apache-2.0
Results
Evaluation results show an improvement in both accuracy and F1-score:
Base model performance:
Accuracy: 50.30%
F1-score: 48.89%
Fine-tuned model performance:
Accuracy: 55.51% (+10%)
F1-score: 54.83% (+5%)
Citation
We will definitely add a bit later.
Model Card Authors
Tleubayeva Arailym, PhD student of Astana IT University
Tabuldin Aisultan, 3rd year student of Astana IT University
Aubakirov Sultan, 3rd year student of Astana IT University