abdulhade's picture
Update README.md
2223cb1 verified
metadata
license: apache-2.0
language:
  - ku
  - en
metrics:
  - accuracy
pipeline_tag: text-classification
library_name: adapter-transformers

Kurdish Language Detector Model

This is a fine-tuned version of abdulhade/RoBERTa-large-SizeCorpus_1B, designed for detecting and classifying Kurdish and English text. Leveraging a custom bilingual corpus, this model is effective in distinguishing between these languages and accurately identifying text segments.

Model Overview

  • Model Type: Text classification (language detection)
  • Base Model: abdulhade/RoBERTa-large-SizeCorpus_1B
  • Languages Supported: English, Kurdish
  • Training Data: Custom bilingual corpus of English and Kurdish text
  • Primary Use Case: Identifying whether input text is in English or Kurdish

Model Performance

The model was evaluated using various metrics and achieved outstanding results:

  • Evaluation Loss: 0.0012
  • Evaluation Accuracy: 99.99%
  • Evaluation F1 Score: 0.9999
  • Evaluation Precision: 0.99999
  • Evaluation Recall: 0.99983

Training Details

  • Training Loss: 0.027
  • Training Runtime: 40,500.85 seconds
  • Samples per Second (Training): 72.35
  • Steps per Second (Training): 4.52
  • Epochs: 3

Evaluation Details

  • Evaluation Runtime: 4,111.17 seconds
  • Samples per Second (Evaluation): 237.58
  • Steps per Second (Evaluation): 14.85

Hardware and Environment

  • Environment: Accelerated hardware (e.g., GPU)
  • Default Inference Device: CPU (specify device=0 for GPU usage)

Quickstart Guide

Installation

Ensure you have the transformers library and torch installed:

pip install transformers torch

from transformers import pipeline

# Load the Kurdish Language Detector
kurdish_detector = pipeline('text-classification', 
                            model='abdulhade/kurdishRoBERTa-language-detector-1B', 
                            tokenizer='abdulhade/kurdishRoBERTa-language-detector-1B')

# Perform a prediction
result = kurdish_detector("Insert your text here")
print(result)  # Outputs: [{'label': 'LABEL_1', 'score': <probability>}]

# Custom function to map the labels
def map_labels(prediction):
    label_mapping = {
        'LABEL_0': 'English',
        'LABEL_1': 'Kurdish'
    }
    # Map the label and keep the score as is
    return {'label': label_mapping[prediction['label']], 'score': prediction['score']}

# Test the model with new input and map the labels
input_text_1 = "Hello World"
input_text_2 = "Hi dear    برام  دەنگ و باست"

# Get predictions
predictions_1 = kurdish_detector(input_text_1)
predictions_2 = kurdish_detector(input_text_2)

# Map and print results
mapped_predictions_1 = [map_labels(pred) for pred in predictions_1]
mapped_predictions_2 = [map_labels(pred) for pred in predictions_2]
print(input_text_1)
print(mapped_predictions_1)  # Expected output: [{'label': 'English', 'score': <score>}]
print(input_text_2)
print(mapped_predictions_2)  # Expected output: [{'label': 'Kurdish', 'score': <score>}]