tiantiaf commited on
Commit
684d363
·
verified ·
1 Parent(s): 2da245c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +114 -4
README.md CHANGED
@@ -1,10 +1,120 @@
1
  ---
 
 
 
 
 
 
 
 
 
 
 
2
  tags:
3
  - model_hub_mixin
4
  - pytorch_model_hub_mixin
 
 
5
  ---
6
 
7
- This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
8
- - Code: https://github.com/tiantiaf0627/voxlect
9
- - Paper: [More Information Needed]
10
- - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model:
3
+ - openai/whisper-large-v3
4
+ datasets:
5
+ - mozilla-foundation/common_voice_11_0
6
+ - ajd12342/paraspeechcaps
7
+ language:
8
+ - en
9
+ license: openrail
10
+ metrics:
11
+ - accuracy
12
+ pipeline_tag: audio-classification
13
  tags:
14
  - model_hub_mixin
15
  - pytorch_model_hub_mixin
16
+ - speaker_dialect_classification
17
+ library_name: transformers
18
  ---
19
 
20
+ # Whisper-Large v3 for English Dialect Classification
21
+
22
+ # Model Description
23
+ This model includes the implementation of English dialect classification described in Voxlect: A Speech Foundation Model Benchmark for Modeling Dialect and Regional Languages Around the Globe
24
+
25
+ Github repository: https://github.com/tiantiaf0627/voxlect
26
+
27
+ The included English dialects are:
28
+ ```
29
+ [
30
+ 'East Asia',
31
+ 'English',
32
+ 'Germanic',
33
+ 'Irish',
34
+ 'North America',
35
+ 'Northern Irish',
36
+ 'Oceania',
37
+ 'Other',
38
+ 'Romance',
39
+ 'Scottish',
40
+ 'Semitic',
41
+ 'Slavic',
42
+ 'South African',
43
+ 'Southeast Asia',
44
+ 'South Asia',
45
+ 'Welsh'
46
+ ]
47
+ ```
48
+
49
+ Compared to Vox-Profile English accent/dialect models, we trained with additional speech data from TIMIT and ParaSpeechCaps.
50
+
51
+ # How to use this model
52
+
53
+ ## Download repo
54
+ ```bash
55
+ git clone [email protected]:tiantiaf0627/voxvoxlect
56
+ ```
57
+ ## Install the package
58
+ ```bash
59
+ conda create -n voxlect python=3.8
60
+ cd voxlect
61
+ pip install -e .
62
+ ```
63
+
64
+ ## Load the model
65
+ ```python
66
+ # Load libraries
67
+ import torch
68
+ import torch.nn.functional as F
69
+ from src.model.dialect.whisper_dialect import WhisperWrapper
70
+
71
+ # Find device
72
+ device = torch.device("cuda") if torch.cuda.is_available() else "cpu"
73
+
74
+ # Load model from Huggingface
75
+ model = WhisperWrapper.from_pretrained("tiantiaf/voxlect-english-dialect-whisper-large-v3").to(device)
76
+ model.eval()
77
+ ```
78
+
79
+ ## Prediction
80
+ ```python
81
+ # Label List
82
+ dialect_list = [
83
+ 'East Asia',
84
+ 'English',
85
+ 'Germanic',
86
+ 'Irish',
87
+ 'North America',
88
+ 'Northern Irish',
89
+ 'Oceania',
90
+ 'Other',
91
+ 'Romance',
92
+ 'Scottish',
93
+ 'Semitic',
94
+ 'Slavic',
95
+ 'South African',
96
+ 'Southeast Asia',
97
+ 'South Asia',
98
+ 'Welsh'
99
+ ]
100
+
101
+ # Load data, here just zeros as the example
102
+ # Our training data filters output audio shorter than 3 seconds (unreliable predictions) and longer than 15 seconds (computation limitation)
103
+ # So you need to prepare your audio to a maximum of 15 seconds, 16kHz and mono channel
104
+ max_audio_length = 15 * 16000
105
+ data = torch.zeros([1, 16000]).float().to(device)[:, :max_audio_length]
106
+ logits, embeddings = model(data, return_feature=True)
107
+
108
+ # Probability and output
109
+ dialect_prob = F.softmax(logits, dim=1)
110
+ print(dialect_list[torch.argmax(dialect_prob).detach().cpu().item()])
111
+ ```
112
+
113
+ Responsible Use: Users should respect the privacy and consent of the data subjects, and adhere to the relevant laws and regulations in their jurisdictions when using Voxlect.
114
+
115
+ ## If you have any questions, please contact: Tiantian Feng ([email protected])
116
+
117
+ ❌ **Out-of-Scope Use**
118
+ - Clinical or diagnostic applications
119
+ - Surveillance
120
+ - Privacy-invasive applications