nielsr HF Staff commited on
Commit
3f86815
·
verified ·
1 Parent(s): bbbdeeb

Fix pipeline tag, add library_name and link to code

Browse files

This PR ensures the model can be found at https://huggingface.co/models?pipeline_tag=automatic-speech-recognition and adds the `library_name`.

Files changed (1) hide show
  1. README.md +15 -15
README.md CHANGED
@@ -1,19 +1,21 @@
1
  ---
2
- tags:
3
- - model_hub_mixin
4
- - pytorch_model_hub_mixin
5
- license: bsd-2-clause
6
  language:
7
  - en
 
8
  metrics:
9
  - accuracy
10
- base_model:
11
- - openai/whisper-large-v3
12
- datasets:
13
- - ajd12342/paraspeechcaps
14
  pipeline_tag: audio-classification
 
 
 
 
15
  ---
16
- # Whisper Large v3 for Voice (Sounding) Quality Classification
 
17
 
18
  # Model Description
19
  This model includes the implementation of voice quality classification described in Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits (https://arxiv.org/pdf/2505.14648)
@@ -23,7 +25,6 @@ Specifically, we report speaker-level Macro-F1 scores. Specifically, we randomly
23
  ### Special Note:
24
  We exclude EARS from ParaSpeechCaps due to its limited number of samples in the holdout set.
25
 
26
-
27
  The included labels are:
28
  <pre>
29
  [
@@ -35,8 +36,8 @@ The included labels are:
35
  ]
36
  </pre>
37
 
38
- - Library: https://github.com/tiantiaf0627/vox-profile-release
39
 
 
40
  # How to use this model
41
 
42
  ## Download repo
@@ -55,11 +56,11 @@ pip install -e .
55
  # Load libraries
56
  import torch
57
  import torch.nn.functional as F
58
- from src.model.voice_quality.whisper_voice_quality import WhisperWrapper
59
  # Find device
60
  device = torch.device("cuda") if torch.cuda.is_available() else "cpu"
61
  # Load model from Huggingface
62
- model = WhisperWrapper.from_pretrained("tiantiaf/whisper-large-v3-voice-quality").to(device)
63
  model.eval()
64
  ```
65
 
@@ -76,7 +77,7 @@ voice_quality_label_list = [
76
 
77
  # Load data, here just zeros as the example
78
  # Our training data filters output audio shorter than 3 seconds (unreliable predictions) and longer than 15 seconds (computation limitation)
79
- # So you need to prepare your audio to a maximum of 15 seconds, 16kHz and mono channel
80
  max_audio_length = 15 * 16000
81
  data = torch.zeros([1, 16000]).float().to(device)[:, :max_audio_length]
82
  logits = model(
@@ -92,7 +93,6 @@ threshold = 0.7
92
  predictions = (voice_quality_prob > threshold).int().detach().cpu().numpy()[0].tolist()
93
  for label_idx in range(len(predictions)):
94
  if predictions[label_idx] == 1: voice_label.append(voice_quality_label_list[label_idx])
95
-
96
  # print the voice quality labels
97
  print(voice_label)
98
  ```
 
1
  ---
2
+ base_model:
3
+ - microsoft/wavlm-large
4
+ datasets:
5
+ - ajd12342/paraspeechcaps
6
  language:
7
  - en
8
+ license: apache-2.0
9
  metrics:
10
  - accuracy
 
 
 
 
11
  pipeline_tag: audio-classification
12
+ tags:
13
+ - model_hub_mixin
14
+ - pytorch_model_hub_mixin
15
+ library_name: transformers
16
  ---
17
+
18
+ # WavLM-Large for Voice (Sounding) Quality Classification
19
 
20
  # Model Description
21
  This model includes the implementation of voice quality classification described in Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits (https://arxiv.org/pdf/2505.14648)
 
25
  ### Special Note:
26
  We exclude EARS from ParaSpeechCaps due to its limited number of samples in the holdout set.
27
 
 
28
  The included labels are:
29
  <pre>
30
  [
 
36
  ]
37
  </pre>
38
 
 
39
 
40
+ - Library: https://github.com/tiantiaf0627/vox-profile-release
41
  # How to use this model
42
 
43
  ## Download repo
 
56
  # Load libraries
57
  import torch
58
  import torch.nn.functional as F
59
+ from src.model.voice_quality.wavlm_voice_quality import WavLMWrapper
60
  # Find device
61
  device = torch.device("cuda") if torch.cuda.is_available() else "cpu"
62
  # Load model from Huggingface
63
+ model = WavLMWrapper.from_pretrained("tiantiaf/wavlm-large-voice-quality").to(device)
64
  model.eval()
65
  ```
66
 
 
77
 
78
  # Load data, here just zeros as the example
79
  # Our training data filters output audio shorter than 3 seconds (unreliable predictions) and longer than 15 seconds (computation limitation)
80
+ # So you need to prepare your audio to a maximum of 15 seconds, 16kHz, and mono channel
81
  max_audio_length = 15 * 16000
82
  data = torch.zeros([1, 16000]).float().to(device)[:, :max_audio_length]
83
  logits = model(
 
93
  predictions = (voice_quality_prob > threshold).int().detach().cpu().numpy()[0].tolist()
94
  for label_idx in range(len(predictions)):
95
  if predictions[label_idx] == 1: voice_label.append(voice_quality_label_list[label_idx])
 
96
  # print the voice quality labels
97
  print(voice_label)
98
  ```