abdulhade
/

kurdishRoBERTa-language-detector-1B

Text Classification

Model card Files Files and versions Community

kurdishRoBERTa-language-detector-1B / README.md

abdulhade's picture

Update README.md

2223cb1 verified 8 months ago

|

history blame contribute delete

3.07 kB

	---
	license: apache-2.0
	language:
	- ku
	- en
	metrics:
	- accuracy
	pipeline_tag: text-classification
	library_name: adapter-transformers
	---

	# Kurdish Language Detector Model

	This is a fine-tuned version of `abdulhade/RoBERTa-large-SizeCorpus_1B`, designed for detecting and classifying Kurdish and English text. Leveraging a custom bilingual corpus, this model is effective in distinguishing between these languages and accurately identifying text segments.

	## Model Overview

	- Model Type: Text classification (language detection)
	- Base Model: `abdulhade/RoBERTa-large-SizeCorpus_1B`
	- Languages Supported: English, Kurdish
	- Training Data: Custom bilingual corpus of English and Kurdish text
	- Primary Use Case: Identifying whether input text is in English or Kurdish

	## Model Performance

	The model was evaluated using various metrics and achieved outstanding results:

	- Evaluation Loss: 0.0012
	- Evaluation Accuracy: 99.99%
	- Evaluation F1 Score: 0.9999
	- Evaluation Precision: 0.99999
	- Evaluation Recall: 0.99983

	### Training Details

	- Training Loss: 0.027
	- Training Runtime: 40,500.85 seconds
	- Samples per Second (Training): 72.35
	- Steps per Second (Training): 4.52
	- Epochs: 3

	### Evaluation Details

	- Evaluation Runtime: 4,111.17 seconds
	- Samples per Second (Evaluation): 237.58
	- Steps per Second (Evaluation): 14.85

	### Hardware and Environment

	- Environment: Accelerated hardware (e.g., GPU)
	- Default Inference Device: CPU (specify `device=0` for GPU usage)

	## Quickstart Guide

	### Installation

	Ensure you have the `transformers` library and `torch` installed:

	```bash
	pip install transformers torch

	from transformers import pipeline

	# Load the Kurdish Language Detector
	kurdish_detector = pipeline('text-classification',
	model='abdulhade/kurdishRoBERTa-language-detector-1B',
	tokenizer='abdulhade/kurdishRoBERTa-language-detector-1B')

	# Perform a prediction
	result = kurdish_detector("Insert your text here")
	print(result) # Outputs: [{'label': 'LABEL_1', 'score': <probability>}]

	# Custom function to map the labels
	def map_labels(prediction):
	label_mapping = {
	'LABEL_0': 'English',
	'LABEL_1': 'Kurdish'
	}
	# Map the label and keep the score as is
	return {'label': label_mapping[prediction['label']], 'score': prediction['score']}

	# Test the model with new input and map the labels
	input_text_1 = "Hello World"
	input_text_2 = "Hi dear برام دەنگ و باست"

	# Get predictions
	predictions_1 = kurdish_detector(input_text_1)
	predictions_2 = kurdish_detector(input_text_2)

	# Map and print results
	mapped_predictions_1 = [map_labels(pred) for pred in predictions_1]
	mapped_predictions_2 = [map_labels(pred) for pred in predictions_2]
	print(input_text_1)
	print(mapped_predictions_1) # Expected output: [{'label': 'English', 'score': <score>}]
	print(input_text_2)
	print(mapped_predictions_2) # Expected output: [{'label': 'Kurdish', 'score': <score>}]