Den4ikAI commited on
Commit
f353225
·
verified ·
1 Parent(s): de5ffbb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -61
README.md CHANGED
@@ -108,66 +108,8 @@ base_model:
108
  pipeline_tag: automatic-speech-recognition
109
  ---
110
 
111
- # Den4ikAI/whisper-large-v2-no-digits-norm-punct
112
 
113
- This is a special version of the `openai/whisper-large-v2` model whose vocabulary has had all tokens corresponding to digits removed, as well as tokens with extraneous punctuation.
114
 
115
- The primary goal of this modification is to **force the model to generate numbers as words rather than digits**. This is extremely useful for text normalization tasks, for example when preparing data for text-to-speech (TTS) systems, where numbers need to be fully spelled out.
116
-
117
- ## Comparison with the Original Model
118
-
119
- Here’s a clear example demonstrating the difference in behavior between the models when transcribing the same audio clip containing the phrase “Билет стоил двадцать тысяч рублей” (“The ticket cost twenty thousand rubles”).
120
-
121
- | Model | Transcription Output |
122
- | ----------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ |
123
- | `openai/whisper-large-v2` (Original) | `<\|startoftranscript\|><\|ru\|><\|transcribe\|><\|notimestamps\|> Билет стоил **20000** рублей.<\|endoftext\|>` |
124
- | `Den4ikAI/whisper-large-v2-no-digits-norm-punct` (This model) | `<\|startoftranscript\|><\|ru\|><\|transcribe\|><\|notimestamps\|> Билет стоил **двадцать тысяч** рублей.<\|endoftext\|>` |
125
-
126
- As you can see, this modified model correctly normalized the number into words, whereas the original version left it as digits.
127
-
128
- ## How to Use
129
-
130
- You can use this model just like any other Whisper model in the `transformers` library.
131
-
132
- ```python
133
- from transformers import WhisperProcessor, WhisperForConditionalGeneration
134
- import torchaudio
135
- import torch
136
-
137
- # Specify the device (GPU if available)
138
- device = "cuda:0" if torch.cuda.is_available() else "cpu"
139
-
140
- # Load the audio file
141
- wav, sr = torchaudio.load("numbers5.mp3")
142
- # Convert to mono and resample to 16 kHz
143
- if wav.shape[0] > 1:
144
- wav = torch.mean(wav, dim=0, keepdim=True)
145
- resampler = torchaudio.transforms.Resample(sr, 16000)
146
- wav = resampler(wav)
147
- audio_input = wav.squeeze(0)
148
-
149
- # Load the processor and model
150
- model_id = "Den4ikAI/whisper-large-v2-no-digits-norm-punct"
151
- processor = WhisperProcessor.from_pretrained(model_id)
152
- model = WhisperForConditionalGeneration.from_pretrained(model_id).to(device)
153
-
154
- # Prepare inputs and extract features
155
- input_features = processor(
156
- audio_input,
157
- sampling_rate=16000,
158
- return_tensors="pt"
159
- ).input_features.to(device)
160
-
161
- # Generate token IDs (for Russian specify language="russian")
162
- predicted_ids = model.generate(input_features, language="russian", task="transcribe")
163
-
164
- # Decode tokens back to text
165
- transcription = processor.batch_decode(
166
- predicted_ids,
167
- skip_special_tokens=False
168
- )
169
-
170
- print(transcription)
171
-
172
- # Example output for an audio clip with numbers:
173
- # ['<|startoftranscript|><|ru|><|transcribe|><|notimestamps|> Билет стоил двадцать тысяч рублей.<|endoftext|>']
 
108
  pipeline_tag: automatic-speech-recognition
109
  ---
110
 
111
+ # Den4ikAI/faster-whisper-large-v2-no-digits-norm-punct
112
 
113
+ Ctranslate2 version of https://huggingface.co/Den4ikAI/whisper-large-v2-no-digits-norm-punct
114
 
115
+ Since the dumbfucks who developed CTranslate2 don't know how to write code, you'll have to build the improved version of CTranslate2 yourself. See here: https://github.com/Den4ikAI/CTranslate2/