whisper-timestamped-cs
Table of Contents
Click to expand
Summary
The "whisper-timestamped-cs" is an acoustic model based on "openai/whisper-large-v3" suitable for Automatic Speech Recognition in code-switching conditions between Spanish and Catalan.
Model Description
The "whisper-timestamped-cs" is an acoustic model suitable for Automatic Speech Recognition in code-switching conditions between Spanish and Catalan. It is the result of finetuning the model "openai/whisper-large-v3" with 2 hours of synthetic code-switching data in Spanish/Catalan generated by the Projecte AINA from Barcelona, Spain.
Intended Uses and Limitations
This model can be used for Automatic Speech Recognition (ASR) in code-switching conditions between Spanish and Catalan. The model is intended to transcribe audio files to plain text.
How to Get Started with the Model
To see an updated and functional version of this code, please see our Notebook
Installation
To use this model, you may install datasets and transformers:
Create a virtual environment:
python -m venv /path/to/venv
Activate the environment:
source /path/to/venv/bin/activate
Install the modules:
pip install datasets transformers
For Inference
In order to transcribe audio in Catalan using this model, you can follow this example:
#Install Prerequisites
pip install torch
pip install datasets
pip install 'transformers[torch]'
pip install evaluate
pip install jiwer
#This code works with GPU
#Notice that: load_metric is no longer part of datasets.
#you have to remove it and use evaluate's load instead.
#(Note from November 2024)
import torch
from transformers import WhisperForConditionalGeneration, WhisperProcessor
#Load the processor and model.
MODEL_NAME="langtech-veu/whisper-timestamped-cs"
processor = WhisperProcessor.from_pretrained(MODEL_NAME)
model = WhisperForConditionalGeneration.from_pretrained(MODEL_NAME).to("cuda")
#Load the dataset
from datasets import load_dataset, load_metric, Audio
ds=load_dataset("projecte-aina/parlament_parla",split='test')
#Downsample to 16kHz
ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
#Process the dataset
def map_to_pred(batch):
audio = batch["audio"]
input_features = processor(audio["array"], sampling_rate=audio["sampling_rate"], return_tensors="pt").input_features
batch["reference"] = processor.tokenizer._normalize(batch['normalized_text'])
with torch.no_grad():
predicted_ids = model.generate(input_features.to("cuda"))[0]
transcription = processor.decode(predicted_ids)
batch["prediction"] = processor.tokenizer._normalize(transcription)
return batch
#Do the evaluation
result = ds.map(map_to_pred)
#Compute the overall WER now.
from evaluate import load
wer = load("wer")
WER=100 * wer.compute(references=result["reference"], predictions=result["prediction"])
print(WER)
Training Details
Training data
The specific dataset used to create the model is a corpus called CAESAR-tiny, which has not been released at the moment.
Training procedure
This model is the result of finetuning the model "openai/whisper-large-v3" by following this tutorial provided by Hugging Face.
Training Hyperparameters
Citation
If this model contributes to your research, please cite the work:
@misc{BSC2025whispertimestampedcs,
title={ASR models for Catalan and Spanish CS: whisper-timestamped-cs.},
author={Takanori, Lucas; Solito, Sarah; Messaoudi, Abir; España i Bonet, Cristina},
organization={Barcelona Supercomputing Center},
url={https://huggingface.co/langtech-veu/whisper-timestamped-cs},
year={2025}
}
Additional Information
Author
The fine-tuning process was performed during 2025 in the Language Technologies Laboratory of the Barcelona Supercomputing Center.
Contact
For further information, please email [email protected].
Copyright
Copyright(c) 2025 by Language Technologies Laboratory, Barcelona Supercomputing Center.
License
Funding
This work has been promoted and financed by the Generalitat de Catalunya through the Aina project.
The training of the model was possible thanks to the computing time provided by Barcelona Supercomputing Center through MareNostrum 5.
- Downloads last month
- 5
Model tree for langtech-veu/whisper-timestamped-cs
Base model
openai/whisper-large-v3