Whisper Fine-tuning for English-Malay Code-Switching
This project provides a configurable way to fine-tune OpenAI's Whisper model specifically on the EN-MALAY-CS (English-Malay Code-Switching) dataset.
Features
- Flexible Configuration: All parameters are configurable through YAML files
- Multi-GPU Support: Automatic detection and support for multiple GPUs
- Dynamic Language Selection: Train on any subset of supported languages
- On-the-fly Processing: Efficient memory usage with dynamic audio preprocessing
- Comprehensive Evaluation: Automatic evaluation on test sets
Configuration
All parameters are configurable through the config.yaml
file. This configuration is specifically set up for English-Malay code-switching data training.
Model Configuration
- Model checkpoint (default:
openai/whisper-large-v3
) - Maximum target length for sequences
Dataset Configuration
- Uses EN-MALAY-CS dataset only
- Dataset sources and splits
- Language-specific settings
- Language-specific settings (subset ratios, validation sizes)
Training Configuration
- Learning rate, batch sizes, training steps
- Multi-GPU vs single GPU settings
- Evaluation and logging parameters
Environment Configuration
- CPU core limits
- Environment variables for optimization
Pushing to Hub
- I have set the configuration to not push to the Hugging Face Hub by default. You can enable this by setting
push_to_hub: true
in your config file.
Usage
Basic Usage
python finetune.py --config config.yaml
Custom Configuration
python finetune.py --config my_custom_config.yaml
Multi-GPU Training
# Using torchrun (recommended) for two GPUs
torchrun --nproc_per_node=2 finetune.py --config config.yaml
Configuration File Structure
The config.yaml
file is organized into the following sections:
- model: Model checkpoint and sequence length settings
- output: Output directory configuration
- environment: Environment variables and CPU settings
- audio: Audio processing settings (sampling rate)
- languages: Malay language configuration
- datasets: EN-MALAY-CS dataset configuration
- training: All training hyperparameters
- data_processing: Data processing settings
Customizing Your Training
Adjusting Training Parameters
Modify the training
section in config.yaml
:
- Change learning rate, batch sizes, or training steps
- Adjust evaluation frequency
- Configure multi-GPU settings
Environment Optimization
Adjust the environment
section to optimize for your system:
- Set CPU core limits
- Configure memory usage settings
Configuration
The provided config.yaml
is specifically configured for English-Malay CS training.
Training Commands
Basic Training
python finetune.py
Single GPU Training
python finetune.py
Multi-GPU Training
torchrun --nproc_per_node=2 finetune.py
Inference Guide
After training your model, you can use the provided inference.py
script for speech recognition:
python inference.py
The inference script includes:
- Model loading from the trained checkpoint
- Audio preprocessing pipeline
- Text generation with proper formatting
- Support for English-Malay code-switching
Using the Trained Model
The inference script automatically handles:
- Loading the fine-tuned model weights
- Audio preprocessing with proper sampling rate
- Generating transcriptions with code-switching support
- Output formatting for evaluation metrics
Dependencies
Install required packages:
pip install -r requirements.txt
Key dependencies:
- PyYAML (for configuration loading)
- torch, transformers, datasets
- librosa (for audio processing)
- evaluate (for metrics)
Evaluation Results
Language | Metric | Error Rate |
---|---|---|
En-Malay | WER | 42.33% |
Note: If you encounter issues running finetune.py, you can use the finetune-backup.py
file which contains the original hardcoded configuration that was used to generate these evaluation metrics.
- Downloads last month
- 5
Model tree for pengyizhou/whisper-eng-malay-cs-finetuned
Base model
openai/whisper-large-v3