--- language: lo license: apache-2.0 tags: - automatic-speech-recognition - speech - audio - lao - wav2vec2 - xls-r datasets: - h3llohihi/lao-asr-thesis-dataset library_name: transformers pipeline_tag: automatic-speech-recognition metrics: - cer base_model: - facebook/wav2vec2-xls-r-300m --- # XLS-R Lao ASR Fine-tuned XLS-R-300M model for Lao automatic speech recognition. ## Model Performance - **Test CER**: 15.14% - **Training Time**: 2.1 hours - **Dialects**: Central, Northern, Southern Lao ## Usage ```python from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor import torch import librosa # Load model and processor model = Wav2Vec2ForCTC.from_pretrained("h3llohihi/xls-r-lao-asr") processor = Wav2Vec2Processor.from_pretrained("h3llohihi/xls-r-lao-asr") # Load audio audio, sr = librosa.load("audio.wav", sr=16000) # Process audio inputs = processor(audio, sampling_rate=16000, return_tensors="pt") # Generate prediction with torch.no_grad(): logits = model(**inputs).logits predicted_ids = torch.argmax(logits, dim=-1) transcription = processor.batch_decode(predicted_ids)[0] print(transcription) ``` ## Citation ```bibtex @thesis{naovalath2025lao, title={Lao Automatic Speech Recognition using Transfer Learning}, author={Souphaxay Naovalath and Sounmy Chanthavong}, advisor={Dr. Somsack Inthasone}, school={National University of Laos, Faculty of Natural Sciences, Computer Science Department}, year={2025} } ```