Susurro: Spanish Speech Recognition Model
Model Description
Susurro is a fine-tuned version of OpenAI's Whisper model, specifically optimized for Spanish speech recognition. This model has been trained on Spanish speech datasets to improve its performance for Spanish language transcription tasks.
Training Data
The model was trained on a Spanish speech dataset consisting of:
- Training set: Spanish speech audio samples
- Test set: Separate validation audio samples
- Audio sampling rate: 16kHz
- Language: Spanish
- Task: Speech transcription
Training Procedure
The model was trained using the following configuration:
- Base model: openai/whisper-large-v3-turbo
- Training type: Fine-tuning
- Batch size: 2 per device
- Gradient accumulation steps: 16
- Learning rate: 1e-5
- Warmup steps: 500
- Max steps: 8000
- Training optimizations:
- Gradient checkpointing enabled
- FP16 training
- 8-bit Adam optimizer
Intended Uses
This model is designed for:
- Spanish speech recognition
- Audio transcription in Spanish
- Real-time speech-to-text applications
How to Use
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch
# Load model and processor
processor = WhisperProcessor.from_pretrained("IsmaelRR/SusurroModel-WhisperTurboV3Spanish")
model = WhisperForConditionalGeneration.from_pretrained("IsmaelRR/SusurroModel-WhisperTurboV3Spanish")
# If you have GPU
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
# Process your audio file
# Note: Make sure your audio is sampled at 16kHz
input_features = processor(
audio["array"],
sampling_rate=16000,
return_tensors="pt"
).input_features.to(device)
# Generate transcription
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription)
Limitations
- The model is specifically trained for Spanish language and may not perform well with other languages
- Audio input should be sampled at 16kHz for optimal performance
- Performance may vary with different audio qualities and accents
Training Infrastructure
- Training framework: 🤗 Transformers
- Python version: 3.8+
- Key dependencies:
- transformers
- torch
- datasets
- numpy
Citation
If you use this model in your research, please cite:
@misc{susurro2024,
author = {Your Name},
title = {Susurro: Fine-tuned Whisper Model for Spanish Speech Recognition},
year = {2024},
publisher = {Hugging Face},
journal = {Hugging Face Model Hub},
howpublished = {\url{https://huggingface.co/IsmaelRR/SusurroModel-WhisperTurboV3Spanish}}
}
License
MIT
Acknowledgements
This model builds upon the OpenAI Whisper model and was trained using the Hugging Face Transformers library. Special thanks to the open-source community and contributors.
- Downloads last month
- 6
Model tree for IsmaelRR/SusurroModel-WhisperTurboV3Spanish
Base model
openai/whisper-large-v3
Finetuned
openai/whisper-large-v3-turbo