tiantiaf
/

whisper-large-v3-narrow-accent

Audio Classification

model_hub_mixin

pytorch_model_hub_mixin

speaker_accent_classification

Model card Files Files and versions Community

tiantiaf commited on 18 days ago

Commit

1b122c7

·

verified ·

1 Parent(s): a585e16

Update README.md

Files changed (1) hide show

README.md +5 -2

README.md CHANGED Viewed

@@ -70,8 +70,11 @@ english_accent_list = [
     'South African', 'Southeast Asia', 'South Asia', 'Welsh'
 ]
-# Load data, here just zeros as the example, audio data should be 16kHz mono channel
-data = torch.zeros([1, 16000]).float().to(device)
 logits, embeddings = model(data, return_feature=True)
 # Probability and output

     'South African', 'Southeast Asia', 'South Asia', 'Welsh'
 ]
+# Load data, here just zeros as the example
+# Our training data filters output audio shorter than 3 seconds (unreliable predictions) and longer than 15 seconds (computation limitation)
+# So you need to prepare your audio to a maximum of 15 seconds, 16kHz and mono channel
+max_audio_length = 15 * 16000
+data = torch.zeros([1, 16000]).float().to(device)[:, :max_audio_length]
 logits, embeddings = model(data, return_feature=True)
 # Probability and output