metadata

license: apache-2.0
language:
  - ku
  - en
metrics:
  - accuracy
pipeline_tag: text-classification
library_name: adapter-transformers

Kurdish Language Detector Model

This is a fine-tuned version of abdulhade/RoBERTa-large-SizeCorpus_1B, designed for detecting and classifying Kurdish and English text. Leveraging a custom bilingual corpus, this model is effective in distinguishing between these languages and accurately identifying text segments.

Model Overview

Model Type: Text classification (language detection)
Base Model: abdulhade/RoBERTa-large-SizeCorpus_1B
Languages Supported: English, Kurdish
Training Data: Custom bilingual corpus of English and Kurdish text
Primary Use Case: Identifying whether input text is in English or Kurdish

Model Performance

The model was evaluated using various metrics and achieved outstanding results:

Evaluation Loss: 0.0012
Evaluation Accuracy: 99.99%
Evaluation F1 Score: 0.9999
Evaluation Precision: 0.99999
Evaluation Recall: 0.99983

Training Details

Training Loss: 0.027
Training Runtime: 40,500.85 seconds
Samples per Second (Training): 72.35
Steps per Second (Training): 4.52
Epochs: 3

Evaluation Details

Evaluation Runtime: 4,111.17 seconds
Samples per Second (Evaluation): 237.58
Steps per Second (Evaluation): 14.85

Hardware and Environment

Environment: Accelerated hardware (e.g., GPU)
Default Inference Device: CPU (specify device=0 for GPU usage)

Quickstart Guide

Installation

Ensure you have the transformers library and torch installed:

pip install transformers torch

from transformers import pipeline

# Load the Kurdish Language Detector
kurdish_detector = pipeline('text-classification', 
                            model='abdulhade/kurdishRoBERTa-language-detector-1B', 
                            tokenizer='abdulhade/kurdishRoBERTa-language-detector-1B')

# Perform a prediction
result = kurdish_detector("Insert your text here")
print(result)  # Outputs: [{'label': 'LABEL_1', 'score': <probability>}]

# Custom function to map the labels
def map_labels(prediction):
    label_mapping = {
        'LABEL_0': 'English',
        'LABEL_1': 'Kurdish'
    }
    # Map the label and keep the score as is
    return {'label': label_mapping[prediction['label']], 'score': prediction['score']}

# Test the model with new input and map the labels
input_text_1 = "Hello World"
input_text_2 = "Hi dear    برام  دەنگ و باست"

# Get predictions
predictions_1 = kurdish_detector(input_text_1)
predictions_2 = kurdish_detector(input_text_2)

# Map and print results
mapped_predictions_1 = [map_labels(pred) for pred in predictions_1]
mapped_predictions_2 = [map_labels(pred) for pred in predictions_2]
print(input_text_1)
print(mapped_predictions_1)  # Expected output: [{'label': 'English', 'score': <score>}]
print(input_text_2)
print(mapped_predictions_2)  # Expected output: [{'label': 'Kurdish', 'score': <score>}]