abdulhade's picture
Update README.md
2223cb1 verified
---
license: apache-2.0
language:
- ku
- en
metrics:
- accuracy
pipeline_tag: text-classification
library_name: adapter-transformers
---
# Kurdish Language Detector Model
This is a fine-tuned version of `abdulhade/RoBERTa-large-SizeCorpus_1B`, designed for detecting and classifying Kurdish and English text. Leveraging a custom bilingual corpus, this model is effective in distinguishing between these languages and accurately identifying text segments.
## Model Overview
- **Model Type**: Text classification (language detection)
- **Base Model**: `abdulhade/RoBERTa-large-SizeCorpus_1B`
- **Languages Supported**: English, Kurdish
- **Training Data**: Custom bilingual corpus of English and Kurdish text
- **Primary Use Case**: Identifying whether input text is in English or Kurdish
## Model Performance
The model was evaluated using various metrics and achieved outstanding results:
- **Evaluation Loss**: 0.0012
- **Evaluation Accuracy**: 99.99%
- **Evaluation F1 Score**: 0.9999
- **Evaluation Precision**: 0.99999
- **Evaluation Recall**: 0.99983
### Training Details
- **Training Loss**: 0.027
- **Training Runtime**: 40,500.85 seconds
- **Samples per Second (Training)**: 72.35
- **Steps per Second (Training)**: 4.52
- **Epochs**: 3
### Evaluation Details
- **Evaluation Runtime**: 4,111.17 seconds
- **Samples per Second (Evaluation)**: 237.58
- **Steps per Second (Evaluation)**: 14.85
### Hardware and Environment
- **Environment**: Accelerated hardware (e.g., GPU)
- **Default Inference Device**: CPU (specify `device=0` for GPU usage)
## Quickstart Guide
### Installation
Ensure you have the `transformers` library and `torch` installed:
```bash
pip install transformers torch
from transformers import pipeline
# Load the Kurdish Language Detector
kurdish_detector = pipeline('text-classification',
model='abdulhade/kurdishRoBERTa-language-detector-1B',
tokenizer='abdulhade/kurdishRoBERTa-language-detector-1B')
# Perform a prediction
result = kurdish_detector("Insert your text here")
print(result) # Outputs: [{'label': 'LABEL_1', 'score': <probability>}]
# Custom function to map the labels
def map_labels(prediction):
label_mapping = {
'LABEL_0': 'English',
'LABEL_1': 'Kurdish'
}
# Map the label and keep the score as is
return {'label': label_mapping[prediction['label']], 'score': prediction['score']}
# Test the model with new input and map the labels
input_text_1 = "Hello World"
input_text_2 = "Hi dear برام دەنگ و باست"
# Get predictions
predictions_1 = kurdish_detector(input_text_1)
predictions_2 = kurdish_detector(input_text_2)
# Map and print results
mapped_predictions_1 = [map_labels(pred) for pred in predictions_1]
mapped_predictions_2 = [map_labels(pred) for pred in predictions_2]
print(input_text_1)
print(mapped_predictions_1) # Expected output: [{'label': 'English', 'score': <score>}]
print(input_text_2)
print(mapped_predictions_2) # Expected output: [{'label': 'Kurdish', 'score': <score>}]