--- license: cc-by-nc-nd-4.0 datasets: - openslr language: - gl pipeline_tag: automatic-speech-recognition tags: - ITG - PyTorch - Transformers - wav2vec2 --- # Wav2Vec2 Large XLSR Galician ## Description This is a fine-tuned version of the [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) pre-trained model for ASR in galician. --- ## Dataset The dataset used for fine-tuning this model was the [OpenSLR galician](https://huggingface.co/datasets/openslr/viewer/SLR77) dataset, available in the openslr repository. --- ## Example inference script ### Check this example script to run our model in inference mode ```python import torch from transformers import AutoProcessor, AutoModelForCTC filename = "demo.wav" #change this line to the name of your audio file sample_rate = 16_000 processor = AutoProcessor.from_pretrained('ITG/wav2vec2-large-xlsr-gl') model = AutoModelForSpeechSeq2Seq.from_pretrained('ITG/wav2vec2-large-xlsr-gl') device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model.to(device) speech_array, _ = librosa.load(filename, sr=sample_rate) inputs = processor(speech_array, sampling_rate=sample_rate, return_tensors="pt", padding=True).to(device) with torch.no_grad(): logits = model(inputs.input_values, attention_mask=inputs.attention_mask.to(device)).logits decode_output = processor.batch_decode(torch.argmax(logits, dim=-1))[0] print(f"ASR Galician wav2vec2-large-xlsr output: {decode_output}") ``` --- ## Fine-tuning hyper-parameters | **Hyper-parameter** | **Value** | |:----------------------------------------:|:---------------------------:| | Training batch size | 16 | | Evaluation batch size | 8 | | Learning rate | 3e-4 | | Gradient accumulation steps | 2 | | Group by length | true | | Evaluation strategy | steps | | Max training epochs | 50 | | Max steps | 4000 | | Generate max length | 225 | | FP16 | true | | Metric for best model | wer | | Greater is better | false | ## Fine-tuning in a different dataset or style If you're interested in fine-tuning your own wav2vec2 model, we suggest starting with the [facebook/wav2vec2-large-xlsr-53 model](https://huggingface.co/facebook/wav2vec2-large-xlsr-53). Additionally, you may find this [fine-tuning on galician notebook by Diego Fustes](https://github.com/diego-fustes/xlsr-fine-tuning-gl/blob/main/Fine_Tune_XLSR_Wav2Vec2_on_Galician.ipynb) to be a valuable resource. This guide served as a helpful reference during the training process of this Galician wav2vec2-large-xlsr model!