Translating English Audio Into Spanish Text

#61

by stvnchnsn - opened Dec 28, 2023

Dec 28, 2023

I'm trying to translate audio that is in english to spanish text using the code listed below. No errors occur but the text is in english with no translation performed. Any clues?

model_id = "openai/whisper-large-v3"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

translate_pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    batch_size=1,
    torch_dtype=torch_dtype,
    device=device,
    return_timestamps=True,
    generate_kwargs={"language": "spanish", "task": "translate"}
)

MathieuBsqt

Jan 11, 2024

"language" parameter is used to indicate the spoken language in the audio.
The "translate" parameter indicates that the speech must be translated into English.

sanchit-gandhi

Jan 11, 2024

Whisper was trained on speech recognition (audio in X -> text in X) and speech translation to English (audio in X -> text in En)

You can also 'trick' it into performing more general speech translation (audio in X -> text in Y) with reasonable results, but not as good as the trained tasks. You just need to set the language to your target language, and the task to "transcribe":

model_id = "openai/whisper-large-v3"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

translate_pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    batch_size=1,
    torch_dtype=torch_dtype,
    device=device,
    return_timestamps=True,
    generate_kwargs={"language": "spanish", "task": "transcribe"}
)

Daniel981215

Feb 2, 2024

Whisper was trained on speech recognition (audio in X -> text in X) and speech translation to English (audio in X -> text in En)

You can also 'trick' it into performing more general speech translation (audio in X -> text in Y) with reasonable results, but not as good as the trained tasks. You just need to set the language to your target language, and the task to "transcribe":
model_id = "openai/whisper-large-v3"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

translate_pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    batch_size=1,
    torch_dtype=torch_dtype,
    device=device,
    return_timestamps=True,
    generate_kwargs={"language": "spanish", "task": "transcribe"}
)

Is there any way to improve performance? I didn't find any dataset (english audio - spanish test) for fine tuning

hf-ds-user

Feb 27, 2024

•

edited Feb 27, 2024

Is there any way to improve performance? I didn't find any dataset (english audio - spanish test) for fine tuning

@Daniel981215
You could obtain such dataset by taking english speech-to-text dataset, then translating english text to spanish (using open source or cloud solutions)

pr0mila-gh0sh

Apr 18

I’ve created a code-switched language dataset for fine-tuning Whisper, including audio data along with CSV and Parquet files, which I’ve stored on Hugging Face. After preparing the dataset, I fine-tuned the model for translation. You can explore the entire end-to-end project in my repo. Here’s the link to check it out: https://github.com/pr0mila/MediBeng-Whisper-Tiny

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment