Whisper Small Hy - Erik Mkrtchyan

This model is a fine-tuned version of openai/whisper-small on the Hy Generated Audio Data with CV 20.0 dataset. It achieves the following results on the evaluation set:

Loss: 0.1185
Wer: 26.9720

Model description

This model is based on OpenAI's Whisper Small and fine-tuned for Armenian using a combination of real and synthetic audio data. It is designed to transcribe Armenian speech into text.

Intended uses & limitations

Intended Uses:

Armenian speech-to-text applications
Research on ASR for low-resource languages
Educational and experimental projects involving Whisper models

Limitations:

May not generalize well to accents or noisy audio not represented in the training set
he model may hallucinate text or produce inaccurate transcriptions, especially on unusual or out-of-distribution inputs, due to the inclusion of TTS-generated synthetic data in training.

Training and evaluation data

The dataset contains both real and high-quality synthetic Armenian speech clips.

Split	# Clips	Duration (hours)
`train`	9,300	13.53
`test`	5,818	9.16
`eval`	5,856	8.76
`generated`	100,000	113.61

Total duration: ~145 hours
Train set duration(train+generated): ~127 hours
Test set duration(test+eval) ~18 hours

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 16
eval_batch_size: 8
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 15000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.1118	0.1464	1000	0.2371	48.5991
0.0959	0.2927	2000	0.1895	41.1675
0.0862	0.4391	3000	0.1716	38.6837
0.0741	0.5855	4000	0.1572	35.3540
0.0708	0.7319	5000	0.1443	33.0242
0.0558	0.8782	6000	0.1352	31.4380
0.0467	1.0246	7000	0.1315	30.2390
0.0528	1.1710	8000	0.1295	29.9233
0.0455	1.3173	9000	0.1280	29.2490
0.0347	1.4637	10000	0.1246	28.9718
0.049	1.6101	11000	0.1221	28.5274
0.0419	1.7564	12000	0.1189	27.9543
0.0371	1.9028	13000	0.1166	27.5242
0.0286	2.0492	14000	0.1173	27.0149
0.0301	2.1956	15000	0.1185	26.9720

Framework versions

Transformers 4.51.3
Pytorch 2.7.0+cu126
Datasets 3.6.0
Tokenizers 0.21.1

ErikMkrtchyan
/

whisper-small-hy