psynote123 commited on
Commit
6cd14d8
·
verified ·
1 Parent(s): 5d56c5c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +258 -0
README.md ADDED
@@ -0,0 +1,258 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - openai/whisper-large-v3
5
+ base_model_relation: quantized
6
+ pipeline_tag: automatic-speech-recognition
7
+ language:
8
+ - en
9
+ - zh
10
+ - de
11
+ - es
12
+ - ru
13
+ - ko
14
+ - fr
15
+ - ja
16
+ - pt
17
+ - tr
18
+ - pl
19
+ - ca
20
+ - nl
21
+ - ar
22
+ - sv
23
+ - it
24
+ - id
25
+ - hi
26
+ - fi
27
+ - vi
28
+ - he
29
+ - uk
30
+ - el
31
+ - ms
32
+ - cs
33
+ - ro
34
+ - da
35
+ - hu
36
+ - ta
37
+ - no
38
+ - th
39
+ - ur
40
+ - hr
41
+ - bg
42
+ - lt
43
+ - la
44
+ - mi
45
+ - ml
46
+ - cy
47
+ - sk
48
+ - te
49
+ - fa
50
+ - lv
51
+ - bn
52
+ - sr
53
+ - az
54
+ - sl
55
+ - kn
56
+ - et
57
+ - mk
58
+ - br
59
+ - eu
60
+ - is
61
+ - hy
62
+ - ne
63
+ - mn
64
+ - bs
65
+ - kk
66
+ - sq
67
+ - sw
68
+ - gl
69
+ - mr
70
+ - pa
71
+ - si
72
+ - km
73
+ - sn
74
+ - yo
75
+ - so
76
+ - af
77
+ - oc
78
+ - ka
79
+ - be
80
+ - tg
81
+ - sd
82
+ - gu
83
+ - am
84
+ - yi
85
+ - lo
86
+ - uz
87
+ - fo
88
+ - ht
89
+ - ps
90
+ - tk
91
+ - nn
92
+ - mt
93
+ - sa
94
+ - lb
95
+ - my
96
+ - bo
97
+ - tl
98
+ - mg
99
+ - as
100
+ - tt
101
+ - haw
102
+ - ln
103
+ - ha
104
+ - ba
105
+ - jw
106
+ - su
107
+ - yue
108
+ tags:
109
+ - audio
110
+ - automatic-speech-recognition
111
+ - speech-recognition
112
+ - whisper
113
+ - annthem
114
+ - qlip
115
+ - thestage
116
+ ---
117
+
118
+ # Elastic model: Whisper Large v3. Fastest and most flexible models for self-serving.
119
+
120
+ Elastic models are the models produced by TheStage AI ANNA: Automated Neural Networks Accelerator. ANNA allows you to control model size, latency and quality with a simple slider movement. For each model, ANNA produces a series of optimized models:
121
+
122
+ * __S__: The fastest model with optimized performance and minimal quality degradation, offering the best speed-accuracy tradeoff for production deployments.
123
+
124
+ __Goals of elastic models:__
125
+
126
+ * Provide flexibility in cost vs quality selection for inference
127
+ * Provide clear quality and latency benchmarks for speech recognition
128
+ * Provide interface of HF libraries: `transformers` and `elastic_models` with a single line of code change for using optimized versions
129
+ * Provide models supported on a wide range of hardware (NVIDIA GPUs), which are pre-compiled and require no JIT
130
+ * Provide the best models and service for self-hosting
131
+
132
+ > It's important to note that we have consolidated all elastic model versions into a single optimized S model that provides the best balance of speed and quality for Whisper Large v3.
133
+
134
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6799fc8e150f5a4014b030ca/V8hpZ-cA9vE5Ijyodp6Ih.png)
135
+
136
+ ## Audio Examples
137
+
138
+ Below are examples demonstrating the transcription quality of the Elastic Whisper Large v3 S model compared to the original.
139
+
140
+ **Example Audio Transcriptions:**
141
+
142
+ | Audio Sample | Original Whisper Large v3 | Elastic S Model |
143
+ |-------------|---------------------------|-----------------|
144
+ | Sample 1 | [Transcription placeholder] | [Transcription placeholder] |
145
+ | Sample 2 | [Transcription placeholder] | [Transcription placeholder] |
146
+ | Sample 3 | [Transcription placeholder] | [Transcription placeholder] |
147
+
148
+ -----
149
+
150
+ ## Inference
151
+
152
+ To infer our Whisper models, you primarily use the `elastic_models.transformers.WhisperForConditionalGeneration` class.
153
+
154
+ **Example using `elastic_models` with the optimized model:**
155
+
156
+ ```python
157
+ import torch
158
+ import librosa
159
+ from transformers import AutoProcessor
160
+ from elastic_models.transformers import WhisperForConditionalGeneration
161
+
162
+ model_name = "openai/whisper-large-v3"
163
+ mode = "S"
164
+
165
+ audio_path = "path_to_your_audio.wav"
166
+ hf_token = "YOUR_TOKEN"
167
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
168
+
169
+ # Load processor and model
170
+ processor = AutoProcessor.from_pretrained(model_name, token=hf_token)
171
+
172
+ model = WhisperForConditionalGeneration.from_pretrained(
173
+ model_name,
174
+ token=hf_token,
175
+ torch_dtype=torch.float16,
176
+ mode=mode,
177
+ device_map=device,
178
+ )
179
+ model.eval()
180
+
181
+ # Load and process audio
182
+ audio, sr = librosa.load(audio_path, sr=16000)
183
+ inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
184
+ inputs = inputs.to(device)
185
+
186
+ print(f"Transcribing audio from: {audio_path}")
187
+ generate_kwargs = {"max_new_tokens": 100, "num_beams": 1}
188
+
189
+ # Generate transcription
190
+ with torch.inference_mode():
191
+ generate_ids = model.generate(**inputs, **generate_kwargs)
192
+
193
+ # Decode the transcription
194
+ transcription = processor.batch_decode(
195
+ generate_ids,
196
+ skip_special_tokens=True,
197
+ clean_up_tokenization_spaces=False
198
+ )[0]
199
+
200
+ print(f"Transcription: {transcription}")
201
+ ```
202
+
203
+ __System requirements:__
204
+ * GPUs: NVIDIA GeForce RTX 4090, GeForce RTX 5090, L40S
205
+ * CPU: AMD, Intel
206
+ * Python: 3.8-3.12 (check dependencies for specific versions)
207
+
208
+ To work with our elastic models and compilation tools, you'll need to install `elastic_models` and `qlip` libraries from TheStage:
209
+
210
+ ```shell
211
+ pip install thestage
212
+ pip install 'thestage-elastic-models[nvidia]'
213
+ pip install flash-attn==2.7.3 --no-build-isolation
214
+ pip uninstall apex
215
+ ```
216
+
217
+ Then go to [app.thestage.ai](https://app.thestage.ai), login and generate API token from your profile page. Set up API token as follows:
218
+
219
+ ```shell
220
+ thestage config set --api-token <YOUR_API_TOKEN>
221
+ ```
222
+
223
+ Congrats, now you can use accelerated models and tools!
224
+
225
+ ----
226
+
227
+ ## Benchmarks
228
+
229
+ Benchmarking is one of the most important procedures during model acceleration. We aim to provide clear performance metrics for Whisper models using our algorithms.
230
+
231
+ ### Quality benchmarks
232
+
233
+ Performance evaluation on standard speech recognition benchmarks:
234
+
235
+ | Metric/Model | S | Original |
236
+ |--------------|---|----------|
237
+ | WER (Common Voice) | [TBD] | [TBD] |
238
+
239
+ * **WER (Word Error Rate)**: The primary metric for evaluating speech recognition accuracy. Lower is better.
240
+ * **Common Voice**: Multilingual speech recognition benchmark covering diverse languages and accents.
241
+
242
+ ### Latency benchmarks (ms)
243
+
244
+ Performance for transcribing audio (ms):
245
+
246
+ **Batch Size 1:**
247
+
248
+ | GPU Type | S | Original |
249
+ |----------|---|----------|
250
+ | GeForce RTX 4090 | [TBD] | [TBD] |
251
+ | GeForce RTX 5090 | [TBD] | [TBD] |
252
+ | L40S | [TBD] | [TBD] |
253
+
254
+ ## Links
255
+
256
+ * __Platform__: [app.thestage.ai](https://app.thestage.ai)
257
+ * __Subscribe for updates__: [TheStageAI X (Twitter)](https://x.com/TheStageAI)
258
+ * __Contact email__: [email protected]