--- base_model: - facebook/mms-lid-256 language: - fr license: cc-by-nc-4.0 metrics: - accuracy pipeline_tag: audio-classification tags: - model_hub_mixin - pytorch_model_hub_mixin - speaker_dialect_classification library_name: transformers --- # MMS-LID-256 for French Dialect Classification # Model Description This model includes the implementation of French dialect classification described in **Voxlect: A Speech Foundation Model Benchmark for Modeling Dialect and Regional Languages Around the Globe** Github repository: https://github.com/tiantiaf0627/voxlect The included French dialects are with speakers from: ``` [ "Africa", "France", "Canada", "Swiss/Belgium/German" ] ``` # How to use this model ## Download repo ```bash git clone git@github.com:tiantiaf0627/voxlect ``` ## Install the package ```bash conda create -n voxlect python=3.8 cd voxlect pip install -e . ``` ## Load the model ```python # Load libraries import torch import torch.nn.functional as F from src.model.dialect.mms_dialect import MMSWrapper # Find device device = torch.device("cuda") if torch.cuda.is_available() else "cpu" # Load model from Huggingface model = MMSWrapper.from_pretrained("tiantiaf/voxlect-french-dialect-mms-lid-256").to(device) model.eval() ``` ## Prediction ```python # Label List dialect_list = [ "Africa", "France", "Canada", "Swiss/Belgium/German" ] # Load data, here just zeros as an example # Our training data filters output audio shorter than 3 seconds (unreliable predictions) and longer than 15 seconds (computation limitation) # So you need to prepare your audio to a maximum of 15 seconds, 16kHz, and mono channel max_audio_length = 15 * 16000 data = torch.zeros([1, 16000]).float().to(device)[:, :max_audio_length] logits, embeddings = model(data, return_feature=True) # Probability and output dialect_prob = F.softmax(logits, dim=1) print(dialect_list[torch.argmax(dialect_prob).detach().cpu().item()]) ``` Responsible Use: Users should respect the privacy and consent of the data subjects, and adhere to the relevant laws and regulations in their jurisdictions when using Voxlect. ## If you have any questions, please contact: Tiantian Feng (tiantiaf@usc.edu) ❌ **Out-of-Scope Use** - Clinical or diagnostic applications - Surveillance - Privacy-invasive applications - No commercial use #### If you like our work or use the models in your work, kindly cite the following. We appreciate your recognition! ``` @article{feng2025voxlect, title={Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe}, author={Feng, Tiantian and Huang, Kevin and Xu, Anfeng and Shi, Xuan and Lertpetchpun, Thanathai and Lee, Jihwan and Lee, Yoonjeong and Byrd, Dani and Narayanan, Shrikanth}, journal={arXiv preprint arXiv:2508.01691}, year={2025} } ```