πŸ—£οΈ HuBERT Arabic Spoken Dialect Classifier

Model Overview

This model is a fine-tuned version of facebook/hubert-base-ls960 for Arabic spoken dialect classification. It identifies Modern Standard Arabic (MSA) and 17 regional Arabic dialects from raw audio.

This model is intended for use in tasks such as dialect identification, linguistic research, and dialect-aware speech processing systems.


🧠 Model Details

  • Base Model: facebook/hubert-base-ls960
  • Task: Audio Classification (Dialect Identification)
  • Languages: Arabic (MSA + 17 dialects)
  • Datasets:
    • QASR corpus (for MSA)
    • ADI17 dataset (for 17 Arabic dialects)

πŸ“Š Labels (id2label)

The model predicts one of the following 18 classes:

{
    "0": "MSA",   // Modern Standard Arabic
    "1": "IRA",   // Iraqi Arabic
    "2": "EGY",   // Egyptian Arabic
    "3": "MAU",   // Mauritanian Arabic
    "4": "KSA",   // Saudi Arabic
    "5": "UAE",   // Emirati Arabic
    "6": "SYR",   // Syrian Arabic
    "7": "PAL",   // Palestinian Arabic
    "8": "LEB",   // Lebanese Arabic
    "9": "LIB",   // Libyan Arabic
    "10": "KUW",  // Kuwaiti Arabic
    "11": "ALG",  // Algerian Arabic
    "12": "OMA",  // Omani Arabic
    "13": "QAT",  // Qatari Arabic
    "14": "YEM",  // Yemeni Arabic
    "15": "SUD",  // Sudanese Arabic
    "16": "MOR",  // Moroccan Arabic
    "17": "JOR",  // Jordanian Arabic
}

πŸš€ Usage

from transformers import Wav2Vec2FeatureExtractor, HubertForSequenceClassification
import torch
import torchaudio

# Load feature extractor and model
processor = Wav2Vec2FeatureExtractor.from_pretrained("IbrahimAmin/hubert-arabic-spoken-dialect-classifier")
model = HubertForSequenceClassification.from_pretrained("IbrahimAmin/hubert-arabic-spoken-dialect-classifier")

# Load audio (must be mono, 16kHz)
waveform, sample_rate = torchaudio.load("your_audio.wav")

# Convert to mono if not already
if waveform.shape[0] > 1:
    waveform = torch.mean(waveform, dim=0, keepdim=True)

# Resample if needed to 16 kHz
if sample_rate != 16000:
    resampler = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=16000)
    waveform = resampler(waveform)

inputs = processor(waveform.squeeze(), sampling_rate=16_000, return_tensors="pt")

# Run inference
with torch.inference_mode():
    logits = model(**inputs).logits

# Get predicted label
predicted_label = torch.argmax(logits, dim=-1).item()
print(f"Predicted Dialect: {model.config.id2label[predicted_label]}")

πŸ‹οΈ Training Datasets

This model was trained using:

  • QASR corpus to represent Modern Standard Arabic (MSA).
  • ADI17 dataset, which includes 17 varieties of spoken Arabic dialects across different countries and regions.

πŸ“ Citation

If you use this model in your research or application, please cite:

@misc{amin2025hubertarabicdialect,
  title={HuBERT Arabic Spoken Dialect Classifier},
  author={Ibrahim Amin},
  year={2025},
  publisher = {Hugging Face},
  howpublished={\url{https://huggingface.co/IbrahimAmin/hubert-arabic-spoken-dialect-classifier}},
}
Downloads last month
18
Safetensors
Model size
94.6M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for IbrahimAmin/hubert-arabic-spoken-dialect-classifier

Finetuned
(90)
this model

Datasets used to train IbrahimAmin/hubert-arabic-spoken-dialect-classifier