Update README.md
Browse files
README.md
CHANGED
@@ -35,12 +35,12 @@ Support my works and open-source movement: https://tirikchilik.uz/islomovs
|
|
35 |
## Training Data
|
36 |
|
37 |
This model was fine-tuned on approximately 475 hours of diverse Uzbek audio data including:
|
38 |
-
-
|
39 |
-
-
|
40 |
-
-
|
41 |
-
-
|
42 |
-
-
|
43 |
-
-
|
44 |
|
45 |
The dataset consisted of 50% human-transcribed and 50% pseudo-transcribed material (using Gemini 2.5 Pro). Special attention was given to Tashkent dialect audio materials to ensure strong performance on this dialect.
|
46 |
|
|
|
35 |
## Training Data
|
36 |
|
37 |
This model was fine-tuned on approximately 475 hours of diverse Uzbek audio data including:
|
38 |
+
- Common Voice 17 dataset (filtered)
|
39 |
+
- USC (filtered)
|
40 |
+
- Google fleurs (filtered)
|
41 |
+
- Podcasts Tashkent Dialect Youtube Uzbek Speech Dataset: [Link HF](https://huggingface.co/datasets/islomov/podcasts_tashkent_dialect_youtube_uzbek_speech_dataset)
|
42 |
+
- News Youtube Uzbek Speech Dataset: [Link HF](https://huggingface.co/datasets/islomov/news_youtube_uzbek_speech_dataset)
|
43 |
+
- IT Youtube Uzbek Speech Dataset: [Link HF](https://huggingface.co/datasets/islomov/it_youtube_uzbek_speech_dataset)
|
44 |
|
45 |
The dataset consisted of 50% human-transcribed and 50% pseudo-transcribed material (using Gemini 2.5 Pro). Special attention was given to Tashkent dialect audio materials to ensure strong performance on this dialect.
|
46 |
|