Korla/Wav2Vec2BertForCTC-hsb
Model Description
Wav2Vec2BertForCTC-hsb is a fine-tuned Wav2Vec2 model with a BERT-style character classification head, adapted for Upper Sorbian automatic speech recognition (ASR). This model has been fine-tuned for CTC (Connectionist Temporal Classification) loss and is capable of transcribing audio in the Upper Sorbian language.
Usage
This model can be used for speech-to-text tasks on Upper Sorbian audio.
An optional 5-gram language model (5gram.bin) is provided for decoding with an external LM scorer. This n-gram model was trained on a corpus of Upper Sorbian Holy Masses, which can help improve decoding accuracy for religious or formal speech domains.
Training Data
The model was fine-tuned on a dataset provided by the Foundation for the Sorbian People, which consists of high-quality recordings and transcripts in Upper Sorbian. The dataset includes diverse speakers and speech conditions, ensuring a robust acoustic model.
Language Model
- Name:
5gram.bin - Type: 5-gram character-level KenLM language model
- Domain: Upper Sorbian religious speech (Holy Masses)
- Usage: For decoding with tools such as CTCDecoder.
Limitations
- The model's accuracy may degrade on informal or highly dialectal speech not represented in the training data.
- The language model is domain-specific (religious speech) and may bias decoding toward that context.
- The model supports only Upper Sorbian, not Lower Sorbian or other Slavic languages.
How to Use
For normal use (without LM) you can load the model into a pipeline.
To use the 5-gram language model for decoding, use the pyctcdecode library.
Citation
Please cite as:
@misc{korla_wav2vec2bertforctc_hsb,
author = {Karl Baier},
title = {Wav2Vec2BertForCTC-hsb},
year = {2025},
howpublished = {\url{https://huggingface.co/Korla/Wav2Vec2BertForCTC-hsb}},
}
- Downloads last month
- 27