Example which works with optimum library
#6
by
radovan86
- opened
Hi future reader,
Here is an example which works with optimum library.
This example showcases how to translate audio from English to French. File was recorded as 16KHz wav file.
Code tested on Mac M1.
It is advisable to create a new virtual environment to make sure that you are starting from clean slate.
Code:
import time
import torchaudio
from optimum.pipelines import pipeline
from transformers import WhisperProcessor, GenerationConfig
model_kwargs = {"encoder_file_name": "encoder_model.onnx", "decoder_file_name": "decoder_model.onnx"}
model = "onnx-community/whisper-large-v3-turbo"
onnx_qa = pipeline(
"automatic-speech-recognition",
model=model,
# subfolder="onnx",
model_kwargs=model_kwargs,
accelerator="ort",
device="cpu",
)
audio, orig_freq = torchaudio.load("test-audio.wav")
audio_resampled = torchaudio.functional.resample(audio,
orig_freq=orig_freq,
new_freq=16_000).numpy().squeeze()
processor = WhisperProcessor.from_pretrained(model)
decoder_ids = processor.get_decoder_prompt_ids(language = "fr", task = "translate")
generation_config = GenerationConfig.from_pretrained(model)
generation_config.forced_decoder_ids = decoder_ids
print("start")
start_time = time.process_time_ns()
result = onnx_qa(audio_resampled, return_timestamps=False, generate_kwargs={"generation_config":generation_config})
end_time = time.process_time_ns()
print(f"Time taken: {(end_time - start_time) / 1_000_000_000} seconds")
print(result)
requirements.txt
aiohappyeyeballs==2.6.1
aiohttp==3.12.2
aiosignal==1.3.2
attrs==25.3.0
certifi==2025.4.26
charset-normalizer==3.4.2
coloredlogs==15.0.1
datasets==3.6.0
dill==0.3.8
filelock==3.18.0
flatbuffers==25.2.10
frozenlist==1.6.0
fsspec==2025.3.0
hf-xet==1.1.2
huggingface-hub==0.32.2
humanfriendly==10.0
idna==3.10
Jinja2==3.1.6
MarkupSafe==3.0.2
mpmath==1.3.0
multidict==6.4.4
multiprocess==0.70.16
networkx==3.4.2
numpy==2.2.6
onnx==1.18.0
onnxruntime==1.22.0
optimum @ git+https://github.com/huggingface/optimum@85376e337681b1db6aa9752cc4f592b56eedb85e
packaging==25.0
pandas==2.2.3
propcache==0.3.1
protobuf==6.31.1
pyarrow==20.0.0
python-dateutil==2.9.0.post0
pytz==2025.2
PyYAML==6.0.2
regex==2024.11.6
requests==2.32.3
safetensors==0.5.3
six==1.17.0
sympy==1.14.0
tokenizers==0.21.1
torch==2.7.0
torchaudio==2.7.0
tqdm==4.67.1
transformers==4.52.3
typing_extensions==4.13.2
tzdata==2025.2
urllib3==2.4.0
xxhash==3.5.0
yarl==1.20.0
I hope this helps anyone!