Model Card for Model ID

This is a fine-tuned BERT-based model for Turkish intent classification, capable of categorizing intents into 82 distinct labels. It was trained on a consolidated dataset of multilingual intent datasets, translated and normalized to Turkish.

How to Get Started with the Model

from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline

model = AutoModelForSequenceClassification.from_pretrained("yeniguno/bert-uncased-turkish-intent-classification")
tokenizer = AutoTokenizer.from_pretrained("yeniguno/bert-uncased-turkish-intent-classification")

pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)

text = "Şarkıyı çal, Sam."
prediction = pipe(text)

print(prediction)
# [{'label': 'play_music', 'score': 0.999117910861969}]

Uses

This model is intended for:

Natural Language Understanding (NLU) tasks involving Turkish text. Classifying user intents in Turkish for applications such as:

  • Voice assistants
  • Chatbots
  • Customer support automation
  • Conversational AI systems

Bias, Risks, and Limitations

The model's performance may degrade on intents that are underrepresented in the training data. Not optimized for languages other than Turkish. Domain-specific intents not included in the dataset may require additional fine-tuning.

Training Details

Training Data

This model was trained on a combination of intent datasets from various sources, normalized to Turkish:

Datasets Used:

  • mteb/amazon_massive_intent
  • mteb/mtop_intent
  • sonos-nlu-benchmark/snips_built_in_intents
  • Mozilla/smart_intent_dataset
  • Bhuvaneshwari/intent_classification
  • clinc/clinc_oos

Each dataset was preprocessed, translated to Turkish where necessary, and intent labels were consolidated into 82 unique classes.

Dataset Sizes:

  • Training: 150,235
  • Validation: 18,780
  • Test: 18,779

Training Procedure

The model was fine-tuned with the following hyperparameters:

Base Model: bert-base-uncased Learning Rate: 3e-5 Batch Size: 32 Epochs: 5 Weight Decay: 0.01 Evaluation Strategy: Per epoch Mixed Precision: FP32 Hardware: A100

Evaluation

Results

Training and Validation:

Epoch Training Loss Validation Loss Accuracy F1 Score Precision Recall
1 0.3485 0.3438 91.16% 90.56% 90.89% 91.16%
2 0.2262 0.2418 93.73% 93.61% 93.67% 93.73%
3 0.1407 0.2389 94.33% 94.20% 94.23% 94.33%
4 0.1002 0.2390 94.68% 94.59% 94.60% 94.68%
5 0.0588 0.2481 94.87% 94.81% 94.83% 94.87%

Test Results:

Metric Value
Loss 0.2457
Accuracy 94.79%
F1 Score 94.79%
Precision 94.85%
Recall 94.79%
Downloads last month
20
Safetensors
Model size
110M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for yeniguno/bert-uncased-turkish-intent-classification

Finetuned
(2441)
this model