Korla/Wav2Vec2BertForCTC-hsb

Model Description

Wav2Vec2BertForCTC-hsb is a fine-tuned Wav2Vec2 model with a BERT-style character classification head, adapted for Upper Sorbian automatic speech recognition (ASR). This model has been fine-tuned for CTC (Connectionist Temporal Classification) loss and is capable of transcribing audio in the Upper Sorbian language.

Usage

This model can be used for speech-to-text tasks on Upper Sorbian audio.

An optional 5-gram language model (5gram.bin) is provided for decoding with an external LM scorer. This n-gram model was trained on a corpus of Upper Sorbian Holy Masses, which can help improve decoding accuracy for religious or formal speech domains.

Training Data

The model was fine-tuned on a dataset provided by the Foundation for the Sorbian People, which consists of high-quality recordings and transcripts in Upper Sorbian. The dataset includes diverse speakers and speech conditions, ensuring a robust acoustic model.

Language Model

Name: 5gram.bin
Type: 5-gram character-level KenLM language model
Domain: Upper Sorbian religious speech (Holy Masses)
Usage: For decoding with tools such as CTCDecoder.

Limitations

The model's accuracy may degrade on informal or highly dialectal speech not represented in the training data.
The language model is domain-specific (religious speech) and may bias decoding toward that context.
The model supports only Upper Sorbian, not Lower Sorbian or other Slavic languages.

How to Use

For normal use (without LM) you can load the model into a pipeline.

To use the 5-gram language model for decoding, use the pyctcdecode library.

Citation

Please cite as:

@misc{korla_wav2vec2bertforctc_hsb,
  author       = {Karl Baier},
  title        = {Wav2Vec2BertForCTC-hsb},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/Korla/Wav2Vec2BertForCTC-hsb}},
}

Downloads last month: 27

Safetensors

Model size

0.6B params

Tensor type

F32