metadata

license: apache-2.0
datasets:
  - Arailym-aitu/small_kazakh_corpus
language:
  - kk
metrics:
  - accuracy
  - f1
base_model:
  - nur-dev/roberta-kaz-large
new_version: nur-dev/roberta-kaz-large
pipeline_tag: text-classification

Model Card for Model ID

This model is designed for text classification tasks in the Kazakh language, based on the RoBERTa architecture and fine-tuned using the Small Kazakh Corpus dataset.

Model Details

Model Description

The model aims to enhance natural language processing (NLP) capabilities for the Kazakh language, particularly in text classification tasks.

Developed by: Tleubayeva Arailym, Tabuldin Aisultan, Aubakirov Sultan
Model type: Transformer-based (RoBERTa)
Language(s) (NLP): Kazakh (kk)
License: apache-2.0

Results

Evaluation results show an improvement in both accuracy and F1-score:

Base model performance:

Accuracy: 50.30%

F1-score: 48.89%

Fine-tuned model performance:

Accuracy: 55.51% (+10%)

F1-score: 54.83% (+5%)

Citation

We will definitely add a bit later.

Model Card Authors

Tleubayeva Arailym, PhD student of Astana IT University

Tabuldin Aisultan, 3rd year student of Astana IT University

Aubakirov Sultan, 3rd year student of Astana IT University