rgomez-itg's picture
Update README.md
f5bd548
metadata
license: cc-by-nc-nd-4.0
datasets:
  - openslr
language:
  - gl
pipeline_tag: automatic-speech-recognition
tags:
  - ITG
  - PyTorch
  - Transformers
  - wav2vec2

Wav2Vec2 Large XLSR Galician

Description

This is a fine-tuned version of the facebook/wav2vec2-large-xlsr-53 pre-trained model for ASR in galician.


Dataset

The dataset used for fine-tuning this model was the OpenSLR galician dataset, available in the openslr repository.


Example inference script

Check this example script to run our model in inference mode

import torch
from transformers import AutoProcessor, AutoModelForCTC
filename = "demo.wav"  #change this line to the name of your audio file
sample_rate = 16_000   
processor = AutoProcessor.from_pretrained('ITG/wav2vec2-large-xlsr-gl')
model = AutoModelForSpeechSeq2Seq.from_pretrained('ITG/wav2vec2-large-xlsr-gl')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
speech_array, _ = librosa.load(filename, sr=sample_rate)
inputs = processor(speech_array, sampling_rate=sample_rate, return_tensors="pt", padding=True).to(device)
with torch.no_grad():
  logits = model(inputs.input_values, attention_mask=inputs.attention_mask.to(device)).logits
decode_output = processor.batch_decode(torch.argmax(logits, dim=-1))[0]
print(f"ASR Galician wav2vec2-large-xlsr output: {decode_output}")

Fine-tuning hyper-parameters

Hyper-parameter Value
Training batch size 16
Evaluation batch size 8
Learning rate 3e-4
Gradient accumulation steps 2
Group by length true
Evaluation strategy steps
Max training epochs 50
Max steps 4000
Generate max length 225
FP16 true
Metric for best model wer
Greater is better false

Fine-tuning in a different dataset or style

If you're interested in fine-tuning your own wav2vec2 model, we suggest starting with the facebook/wav2vec2-large-xlsr-53 model. Additionally, you may find this fine-tuning on galician notebook by Diego Fustes to be a valuable resource. This guide served as a helpful reference during the training process of this Galician wav2vec2-large-xlsr model!