mahwizzzz
/

U-CTC

Automatic Speech Recognition

Model card Files Files and versions

U-CTC

U-CTC is an Urdu automatic speech recognition (ASR) model based on the Parakeet-CTC-0.6B architecture. It has been fine-tuned on ~21 hours of Urdu speech data using the NVIDIA NeMo framework. The model is optimized for CTC-based transcription of spoken Urdu.

Model Summary

Model Name: U-CTC
Base Architecture: Parakeet-CTC-0.6B
Framework: NVIDIA NeMo
Language: Urdu
Model Type: Conformer Encoder + CTC Decoder
Loss Function: CTC Loss
Hardware: Trained on NVIDIA RTX 3090

Training Configuration

Setting	Value
Epochs	69
Max Steps	14,800
Optimizer	AdamW
Learning Rate	0.001
Betas	(0.9, 0.98)
Weight Decay	0.001
Scheduler	CosineAnnealing
Warmup Steps	15,000
Min LR	0.0001

Dataset

The model was trained and evaluated on a manually curated Urdu speech dataset:

Split	Files	Duration
Train	9,425	10.87 h
Validation	4,056	5.22 h
Test	4,056	5.22 h

Total audio hours: ~21.3 hours
Samples skipped due to CTC alignment failure: ~2.57%
Average AM sequence length: 50.39
Average target sequence length: 30.51
AM-to-target length ratio: ~1.83

Downloads last month: 10

Model tree for mahwizzzz/U-CTC

Base model

nvidia/parakeet-ctc-0.6b

Finetuned

(2)

this model