Update README.md
Browse files
README.md
CHANGED
@@ -14,6 +14,16 @@ hardware: H100
|
|
14 |
|
15 |
This is an Arabic text-to-speech model based on StyleTTS2 architecture, specifically adapted for Arabic language synthesis. The model achieves good quality Arabic speech synthesis, though not yet state-of-the-art, and further experimentation is needed to optimize performance for Arabic language specifically. All training objectives from the original StyleTTS2 were maintained, except for the WavLM objectives which were removed as they were primarily designed for English speech.
|
16 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
## Efficiency and Performance
|
18 |
|
19 |
A key strength of this model lies in its efficiency and performance characteristics:
|
@@ -25,15 +35,6 @@ A key strength of this model lies in its efficiency and performance characterist
|
|
25 |
|
26 |
Note: According to the StyleTTS2 authors, performance should improve further when training a single-speaker model from scratch rather than fine-tuning. This wasn't attempted in our case due to computational resource constraints, suggesting potential for even better results with more extensive training.
|
27 |
|
28 |
-
## Example
|
29 |
-
|
30 |
-
Here is an example output from the model:
|
31 |
-
|
32 |
-
#### Sample 1
|
33 |
-
<audio controls>
|
34 |
-
<source src="https://huggingface.co/fadi77/StyleTTS2-LibriTTS-arabic/resolve/main/synthesized_audio.wav" type="audio/wav">
|
35 |
-
Your browser does not support the audio element.
|
36 |
-
</audio>
|
37 |
|
38 |
## Model Details
|
39 |
|
|
|
14 |
|
15 |
This is an Arabic text-to-speech model based on StyleTTS2 architecture, specifically adapted for Arabic language synthesis. The model achieves good quality Arabic speech synthesis, though not yet state-of-the-art, and further experimentation is needed to optimize performance for Arabic language specifically. All training objectives from the original StyleTTS2 were maintained, except for the WavLM objectives which were removed as they were primarily designed for English speech.
|
16 |
|
17 |
+
## Example
|
18 |
+
|
19 |
+
Here is an example output from the model:
|
20 |
+
|
21 |
+
#### Sample 1
|
22 |
+
<audio controls>
|
23 |
+
<source src="https://huggingface.co/fadi77/StyleTTS2-LibriTTS-arabic/resolve/main/synthesized_audio.wav" type="audio/wav">
|
24 |
+
Your browser does not support the audio element.
|
25 |
+
</audio>
|
26 |
+
|
27 |
## Efficiency and Performance
|
28 |
|
29 |
A key strength of this model lies in its efficiency and performance characteristics:
|
|
|
35 |
|
36 |
Note: According to the StyleTTS2 authors, performance should improve further when training a single-speaker model from scratch rather than fine-tuning. This wasn't attempted in our case due to computational resource constraints, suggesting potential for even better results with more extensive training.
|
37 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
38 |
|
39 |
## Model Details
|
40 |
|