Update README.md
Browse files
README.md
CHANGED
@@ -39,28 +39,6 @@ The model transcribes text in Arabic without diacritical marks and supports peri
|
|
39 |
|
40 |
This model is ready for commercial and non-commercial use.
|
41 |
|
42 |
-
## License
|
43 |
-
|
44 |
-
License to use this model is covered by the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/). By downloading the public and release version of the model, you accept the terms and conditions of the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license.
|
45 |
-
|
46 |
-
## References
|
47 |
-
|
48 |
-
[1] [Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition](https://arxiv.org/abs/2305.05084)
|
49 |
-
|
50 |
-
[2] [Google Sentencepiece Tokenizer](https://github.com/google/sentencepiece)
|
51 |
-
|
52 |
-
[3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
|
53 |
-
|
54 |
-
[4] [Open Universal Arabic ASR Leaderboard](https://huggingface.co/spaces/elmresearchcenter/open_universal_arabic_asr_leaderboard)
|
55 |
-
|
56 |
-
<!-- ## NVIDIA NeMo: Training
|
57 |
-
|
58 |
-
To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo).
|
59 |
-
We recommend you install it after you've installed latest Pytorch version.
|
60 |
-
```
|
61 |
-
pip install nemo_toolkit['all']
|
62 |
-
```
|
63 |
-
-->
|
64 |
## Model Architecture
|
65 |
|
66 |
FastConformer [1] is an optimized version of the Conformer model with 8x depthwise-separable convolutional downsampling.
|
@@ -83,7 +61,6 @@ This model provides transcribed speech as a string for a given audio sample.
|
|
83 |
- **Other Properties Related to Output:** May Need Inverse Text Normalization; Does Not Handle Special Characters; Outputs text in Arabic without diacritical marks
|
84 |
|
85 |
## Limitations
|
86 |
-
|
87 |
The model is non-streaming and outputs the speech as a string without diacritical marks.
|
88 |
Not recommended for word-for-word transcription and punctuation as accuracy varies based on the characteristics of input audio (unrecognized word, accent, noise, speech type, and context of speech).
|
89 |
|
@@ -200,4 +177,18 @@ asr_model.transcribe(['sample_audio_1.wav', 'sample_audio_2.wav', 'sample_audio_
|
|
200 |
- Model outputs text in Arabic without diacritical marks
|
201 |
- Output text requires Inverse Text Normalization
|
202 |
- The model is noise-sensitive
|
203 |
-
- The model is Egyptian Dialect further finetuned
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
|
40 |
This model is ready for commercial and non-commercial use.
|
41 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
42 |
## Model Architecture
|
43 |
|
44 |
FastConformer [1] is an optimized version of the Conformer model with 8x depthwise-separable convolutional downsampling.
|
|
|
61 |
- **Other Properties Related to Output:** May Need Inverse Text Normalization; Does Not Handle Special Characters; Outputs text in Arabic without diacritical marks
|
62 |
|
63 |
## Limitations
|
|
|
64 |
The model is non-streaming and outputs the speech as a string without diacritical marks.
|
65 |
Not recommended for word-for-word transcription and punctuation as accuracy varies based on the characteristics of input audio (unrecognized word, accent, noise, speech type, and context of speech).
|
66 |
|
|
|
177 |
- Model outputs text in Arabic without diacritical marks
|
178 |
- Output text requires Inverse Text Normalization
|
179 |
- The model is noise-sensitive
|
180 |
+
- The model is Egyptian Dialect further finetuned
|
181 |
+
|
182 |
+
## License
|
183 |
+
|
184 |
+
License to use this model is covered by the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/). By downloading the public and release version of the model, you accept the terms and conditions of the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license.
|
185 |
+
|
186 |
+
## References
|
187 |
+
|
188 |
+
[1] [Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition](https://arxiv.org/abs/2305.05084)
|
189 |
+
|
190 |
+
[2] [Google Sentencepiece Tokenizer](https://github.com/google/sentencepiece)
|
191 |
+
|
192 |
+
[3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
|
193 |
+
|
194 |
+
[4] [Open Universal Arabic ASR Leaderboard](https://huggingface.co/spaces/elmresearchcenter/open_universal_arabic_asr_leaderboard)
|