README.md · Elyadata/ADI-whisper-ADI20 at main

metadata

language:
  - ar
pipeline_tag: audio-classification
library_name: speechbrain
tags:
  - DIalectID
  - ADI
  - ADI-20
  - speechbrain
  - Identification
  - pytorch
  - embeddings
datasets:
  - ADI-20
metrics:
  - f1
  - precision
  - recall
  - accuracy

Install Requirements

SpeechBrain

First of all, please install SpeechBrain with the following command:

pip install git+https://github.com/speechbrain/speechbrain.git@develop

Clone ADI github repository

git clone https://github.com/elyadata/ADI-20
cd ADI-20
pip install -r requirements.txt

Perform Arabic Dialect Identification

from inference.classifier_attention_pooling import WhisperDialectClassifier

dialect_id = WhisperDialectClassifier.from_hparams(
    source="",
    hparams_file="hyperparms.yaml",
    savedir="pretrained_DID/tmp").to("cuda")

dialect_id.device = "cuda"

dialect_id.classify_file("filenane.wav")

Citation

If using this work, please cite:

@inproceedings{elleuch2025adi20,
  author    = {Haroun Elleuch and Salima Mdhaffar and Yannick Estève and Fethi Bougares},
  title     = {ADI‑20: Arabic Dialect Identification Dataset and Models},
  booktitle = {Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech)},
  year      = {2025},
  address   = {Rotterdam Ahoy Convention Centre, Rotterdam, The Netherlands},
  month     = {August},
  days      = {17‑21}
}