U-CTC
U-CTC is an Urdu automatic speech recognition (ASR) model based on the Parakeet-CTC-0.6B architecture. It has been fine-tuned on ~21 hours of Urdu speech data using the NVIDIA NeMo framework. The model is optimized for CTC-based transcription of spoken Urdu.
Model Summary
- Model Name: U-CTC
- Base Architecture: Parakeet-CTC-0.6B
- Framework: NVIDIA NeMo
- Language: Urdu
- Model Type: Conformer Encoder + CTC Decoder
- Loss Function: CTC Loss
- Hardware: Trained on NVIDIA RTX 3090
Training Configuration
Setting | Value |
---|---|
Epochs | 69 |
Max Steps | 14,800 |
Optimizer | AdamW |
Learning Rate | 0.001 |
Betas | (0.9, 0.98) |
Weight Decay | 0.001 |
Scheduler | CosineAnnealing |
Warmup Steps | 15,000 |
Min LR | 0.0001 |
Dataset
The model was trained and evaluated on a manually curated Urdu speech dataset:
Split | Files | Duration |
---|---|---|
Train | 9,425 | 10.87 h |
Validation | 4,056 | 5.22 h |
Test | 4,056 | 5.22 h |
- Total audio hours: ~21.3 hours
- Samples skipped due to CTC alignment failure: ~2.57%
- Average AM sequence length: 50.39
- Average target sequence length: 30.51
- AM-to-target length ratio: ~1.83
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for mahwizzzz/U-CTC
Base model
nvidia/parakeet-ctc-0.6b