U-CTC

U-CTC is an Urdu automatic speech recognition (ASR) model based on the Parakeet-CTC-0.6B architecture. It has been fine-tuned on ~21 hours of Urdu speech data using the NVIDIA NeMo framework. The model is optimized for CTC-based transcription of spoken Urdu.


Model Summary

  • Model Name: U-CTC
  • Base Architecture: Parakeet-CTC-0.6B
  • Framework: NVIDIA NeMo
  • Language: Urdu
  • Model Type: Conformer Encoder + CTC Decoder
  • Loss Function: CTC Loss
  • Hardware: Trained on NVIDIA RTX 3090

Training Configuration

Setting Value
Epochs 69
Max Steps 14,800
Optimizer AdamW
Learning Rate 0.001
Betas (0.9, 0.98)
Weight Decay 0.001
Scheduler CosineAnnealing
Warmup Steps 15,000
Min LR 0.0001

Dataset

The model was trained and evaluated on a manually curated Urdu speech dataset:

Split Files Duration
Train 9,425 10.87 h
Validation 4,056 5.22 h
Test 4,056 5.22 h
  • Total audio hours: ~21.3 hours
  • Samples skipped due to CTC alignment failure: ~2.57%
  • Average AM sequence length: 50.39
  • Average target sequence length: 30.51
  • AM-to-target length ratio: ~1.83



Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for mahwizzzz/U-CTC

Finetuned
(2)
this model