islomov commited on
Commit
df9fae2
·
verified ·
1 Parent(s): c6e59ec

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -35,12 +35,12 @@ Support my works and open-source movement: https://tirikchilik.uz/islomovs
35
  ## Training Data
36
 
37
  This model was fine-tuned on approximately 475 hours of diverse Uzbek audio data including:
38
- - Publicly available podcasts
39
- - Tashkent dialect podcasts
40
- - News
41
- - Google fleurs
42
- - USC
43
- - Common Voice 17 dataset
44
 
45
  The dataset consisted of 50% human-transcribed and 50% pseudo-transcribed material (using Gemini 2.5 Pro). Special attention was given to Tashkent dialect audio materials to ensure strong performance on this dialect.
46
 
 
35
  ## Training Data
36
 
37
  This model was fine-tuned on approximately 475 hours of diverse Uzbek audio data including:
38
+ - Common Voice 17 dataset (filtered)
39
+ - USC (filtered)
40
+ - Google fleurs (filtered)
41
+ - Podcasts Tashkent Dialect Youtube Uzbek Speech Dataset: [Link HF](https://huggingface.co/datasets/islomov/podcasts_tashkent_dialect_youtube_uzbek_speech_dataset)
42
+ - News Youtube Uzbek Speech Dataset: [Link HF](https://huggingface.co/datasets/islomov/news_youtube_uzbek_speech_dataset)
43
+ - IT Youtube Uzbek Speech Dataset: [Link HF](https://huggingface.co/datasets/islomov/it_youtube_uzbek_speech_dataset)
44
 
45
  The dataset consisted of 50% human-transcribed and 50% pseudo-transcribed material (using Gemini 2.5 Pro). Special attention was given to Tashkent dialect audio materials to ensure strong performance on this dialect.
46