tiantiaf
/

voxlect-english-dialect-whisper-large-v3

@@ -1,10 +1,120 @@
 ---
 tags:
 - model_hub_mixin
 - pytorch_model_hub_mixin
 ---
-This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
-- Code: https://github.com/tiantiaf0627/voxlect
-- Paper: [More Information Needed]
-- Docs: [More Information Needed]

 ---
+base_model:
+- openai/whisper-large-v3
+datasets:
+- mozilla-foundation/common_voice_11_0
+- ajd12342/paraspeechcaps
+language:
+- en
+license: openrail
+metrics:
+- accuracy
+pipeline_tag: audio-classification
 tags:
 - model_hub_mixin
 - pytorch_model_hub_mixin
+- speaker_dialect_classification
+library_name: transformers
 ---
+# Whisper-Large v3 for English Dialect Classification
+# Model Description
+This model includes the implementation of English dialect classification described in Voxlect: A Speech Foundation Model Benchmark for Modeling Dialect and Regional Languages Around the Globe
+Github repository: https://github.com/tiantiaf0627/voxlect
+The included English dialects are:
+```
+[
+  'East Asia',
+  'English',
+  'Germanic',
+  'Irish',
+  'North America',
+  'Northern Irish',
+  'Oceania',
+  'Other',
+  'Romance',
+  'Scottish',
+  'Semitic',
+  'Slavic',
+  'South African',
+  'Southeast Asia',
+  'South Asia',
+  'Welsh'
+]
+```
+Compared to Vox-Profile English accent/dialect models, we trained with additional speech data from TIMIT and ParaSpeechCaps.
+# How to use this model
+## Download repo
+```bash
+git clone [email protected]:tiantiaf0627/voxvoxlect
+```
+## Install the package
+```bash
+conda create -n voxlect python=3.8
+cd voxlect
+pip install -e .
+```
+## Load the model
+```python
+# Load libraries
+import torch
+import torch.nn.functional as F
+from src.model.dialect.whisper_dialect import WhisperWrapper
+# Find device
+device = torch.device("cuda") if torch.cuda.is_available() else "cpu"
+# Load model from Huggingface
+model = WhisperWrapper.from_pretrained("tiantiaf/voxlect-english-dialect-whisper-large-v3").to(device)
+model.eval()
+```
+## Prediction
+```python
+# Label List
+dialect_list = [
+  'East Asia',
+  'English',
+  'Germanic',
+  'Irish',
+  'North America',
+  'Northern Irish',
+  'Oceania',
+  'Other',
+  'Romance',
+  'Scottish',
+  'Semitic',
+  'Slavic',
+  'South African',
+  'Southeast Asia',
+  'South Asia',
+  'Welsh'
+]
+# Load data, here just zeros as the example
+# Our training data filters output audio shorter than 3 seconds (unreliable predictions) and longer than 15 seconds (computation limitation)
+# So you need to prepare your audio to a maximum of 15 seconds, 16kHz and mono channel
+max_audio_length = 15 * 16000
+data = torch.zeros([1, 16000]).float().to(device)[:, :max_audio_length]
+logits, embeddings = model(data, return_feature=True)
+# Probability and output
+dialect_prob = F.softmax(logits, dim=1)
+print(dialect_list[torch.argmax(dialect_prob).detach().cpu().item()])
+```
+Responsible Use: Users should respect the privacy and consent of the data subjects, and adhere to the relevant laws and regulations in their jurisdictions when using Voxlect.
+## If you have any questions, please contact: Tiantian Feng ([email protected])
+❌ **Out-of-Scope Use**
+- Clinical or diagnostic applications
+- Surveillance
+- Privacy-invasive applications