TawasulAI
/

tawasul-egy-stt

@@ -39,28 +39,6 @@ The model transcribes text in Arabic without diacritical marks and supports peri
 This model is ready for commercial and non-commercial use.
-## License
-License to use this model is covered by the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/). By downloading the public and release version of the model, you accept the terms and conditions of the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license.
-## References
-[1] [Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition](https://arxiv.org/abs/2305.05084)
-[2] [Google Sentencepiece Tokenizer](https://github.com/google/sentencepiece)
-[3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
-[4] [Open Universal Arabic ASR Leaderboard](https://huggingface.co/spaces/elmresearchcenter/open_universal_arabic_asr_leaderboard)
-<!-- ## NVIDIA NeMo: Training
-To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo).
-We recommend you install it after you've installed latest Pytorch version.
-```
-pip install nemo_toolkit['all']
-```
- -->
 ## Model Architecture
 FastConformer [1] is an optimized version of the Conformer model with 8x depthwise-separable convolutional downsampling.
@@ -83,7 +61,6 @@ This model provides transcribed speech as a string for a given audio sample.
   - **Other Properties Related to Output:** May Need Inverse Text Normalization; Does Not Handle Special Characters; Outputs text in Arabic without diacritical marks
 ## Limitations
 The model is non-streaming and outputs the speech as a string without diacritical marks.
 Not recommended for word-for-word transcription and punctuation as accuracy varies based on the characteristics of input audio (unrecognized word, accent, noise, speech type, and context of speech).
@@ -200,4 +177,18 @@ asr_model.transcribe(['sample_audio_1.wav', 'sample_audio_2.wav', 'sample_audio_
 - Model outputs text in Arabic without diacritical marks
 - Output text requires Inverse Text Normalization
 - The model is noise-sensitive
-- The model is Egyptian Dialect further finetuned

 This model is ready for commercial and non-commercial use.
 ## Model Architecture
 FastConformer [1] is an optimized version of the Conformer model with 8x depthwise-separable convolutional downsampling.
   - **Other Properties Related to Output:** May Need Inverse Text Normalization; Does Not Handle Special Characters; Outputs text in Arabic without diacritical marks
 ## Limitations
 The model is non-streaming and outputs the speech as a string without diacritical marks.
 Not recommended for word-for-word transcription and punctuation as accuracy varies based on the characteristics of input audio (unrecognized word, accent, noise, speech type, and context of speech).
 - Model outputs text in Arabic without diacritical marks
 - Output text requires Inverse Text Normalization
 - The model is noise-sensitive
+- The model is Egyptian Dialect further finetuned
+## License
+License to use this model is covered by the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/). By downloading the public and release version of the model, you accept the terms and conditions of the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license.
+## References
+[1] [Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition](https://arxiv.org/abs/2305.05084)
+[2] [Google Sentencepiece Tokenizer](https://github.com/google/sentencepiece)
+[3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
+[4] [Open Universal Arabic ASR Leaderboard](https://huggingface.co/spaces/elmresearchcenter/open_universal_arabic_asr_leaderboard)