Add pipeline tag, library name and link to Github repo
Browse filesThis PR adds the `pipeline_tag` and `library_name` to the model card metadata.
The `pipeline_tag` is set to `automatic-speech-recognition` as this model performs Automatic Speech Recognition.
The `library_name` is set to `transformers` because the model is readily usable with the Hugging Face Transformers library.
I also added a link to the Github repository in the overview.
This addition improves discoverability of the model on the Hugging Face Hub.
README.md
CHANGED
@@ -1,19 +1,21 @@
|
|
1 |
---
|
2 |
-
license: cc-by-4.0
|
3 |
-
language:
|
4 |
-
- en
|
5 |
-
- it
|
6 |
datasets:
|
7 |
- FBK-MT/mosel
|
8 |
- facebook/covost2
|
9 |
- openslr/librispeech_asr
|
10 |
- facebook/voxpopuli
|
|
|
|
|
|
|
|
|
11 |
metrics:
|
12 |
- wer
|
13 |
tags:
|
14 |
- speech
|
15 |
- speech recognition
|
16 |
- ASR
|
|
|
|
|
17 |
---
|
18 |
|
19 |
# FAMA-small-asr
|
@@ -40,7 +42,6 @@ All the artifacts used for realizing FAMA models, including codebase, datasets,
|
|
40 |
themself are [released under OS-compliant licenses](#license), promoting a more
|
41 |
responsible creation of models in our community.
|
42 |
|
43 |
-
|
44 |
It is available in 2 sizes, with 2 variants for ASR only:
|
45 |
|
46 |
- [FAMA-small](https://huggingface.co/FBK-MT/fama-small) - 475 million parameters
|
@@ -49,7 +50,7 @@ It is available in 2 sizes, with 2 variants for ASR only:
|
|
49 |
- [FAMA-medium-asr](https://huggingface.co/FBK-MT/fama-medium-asr) - 878 million parameters
|
50 |
|
51 |
For more information about FAMA, please check our [blog post](https://huggingface.co/blog/FAMA/release) and the [arXiv](https://arxiv.org/abs/2505.22759) preprint.
|
52 |
-
|
53 |
|
54 |
## Usage
|
55 |
|
@@ -124,7 +125,6 @@ We also benchmark FAMA in terms of computational time and maximum batch size sup
|
|
124 |
- FAMA achieves up to 4.2 WER improvement on average across languages compared to OWSM v3.1
|
125 |
- FAMA is up to 8 times faster than Whisper large-v3 while achieving comparable performance
|
126 |
|
127 |
-
|
128 |
### Automatic Speech Recogniton (ASR)
|
129 |
| ***Model/Dataset WER (β)*** | **CommonVoice**-*en* | **CommonVoice**-*it* | **MLS**-*en* | **MLS**-*it* | **VoxPopuli**-*en* | **VoxPopuli**-*it* | **AVG**-*en* | **AVG**-*it* |
|
130 |
|-----------------------------------------|---------|---------|---------|---------|---------|----------|---------|----------|
|
@@ -138,7 +138,6 @@ We also benchmark FAMA in terms of computational time and maximum batch size sup
|
|
138 |
| FAMA *small* | 13.7 | 8.6 | 5.8 | 12.8 | 7.3 | **15.6** | 8.9 | 12.3 |
|
139 |
| FAMA *medium* | 11.5 | 7.0 | 5.2 | 13.9 | 7.2 | 15.9 | 8.0 | 12.3 |
|
140 |
|
141 |
-
|
142 |
### Computational Time and Maximum Batch Size
|
143 |
|
144 |
| ***Model*** | ***Batch Size*** | ***xRTF en (β)*** | ***xRTF it (β)*** | ***xRTF AVG (β)*** |
|
@@ -150,7 +149,6 @@ We also benchmark FAMA in terms of computational time and maximum batch size sup
|
|
150 |
| FAMA *small* | 16 | **57.4** | **56.0** | **56.7** |
|
151 |
| FAMA *medium* | 8 | 39.5 | 41.2 | 40.4 |
|
152 |
|
153 |
-
|
154 |
## License
|
155 |
|
156 |
We release the FAMA model weights, and training data under the CC-BY 4.0 license.
|
|
|
1 |
---
|
|
|
|
|
|
|
|
|
2 |
datasets:
|
3 |
- FBK-MT/mosel
|
4 |
- facebook/covost2
|
5 |
- openslr/librispeech_asr
|
6 |
- facebook/voxpopuli
|
7 |
+
language:
|
8 |
+
- en
|
9 |
+
- it
|
10 |
+
license: cc-by-4.0
|
11 |
metrics:
|
12 |
- wer
|
13 |
tags:
|
14 |
- speech
|
15 |
- speech recognition
|
16 |
- ASR
|
17 |
+
library_name: transformers
|
18 |
+
pipeline_tag: automatic-speech-recognition
|
19 |
---
|
20 |
|
21 |
# FAMA-small-asr
|
|
|
42 |
themself are [released under OS-compliant licenses](#license), promoting a more
|
43 |
responsible creation of models in our community.
|
44 |
|
|
|
45 |
It is available in 2 sizes, with 2 variants for ASR only:
|
46 |
|
47 |
- [FAMA-small](https://huggingface.co/FBK-MT/fama-small) - 475 million parameters
|
|
|
50 |
- [FAMA-medium-asr](https://huggingface.co/FBK-MT/fama-medium-asr) - 878 million parameters
|
51 |
|
52 |
For more information about FAMA, please check our [blog post](https://huggingface.co/blog/FAMA/release) and the [arXiv](https://arxiv.org/abs/2505.22759) preprint.
|
53 |
+
The code is available in the [Github repository](https://github.com/hlt-mt/FBK-fairseq).
|
54 |
|
55 |
## Usage
|
56 |
|
|
|
125 |
- FAMA achieves up to 4.2 WER improvement on average across languages compared to OWSM v3.1
|
126 |
- FAMA is up to 8 times faster than Whisper large-v3 while achieving comparable performance
|
127 |
|
|
|
128 |
### Automatic Speech Recogniton (ASR)
|
129 |
| ***Model/Dataset WER (β)*** | **CommonVoice**-*en* | **CommonVoice**-*it* | **MLS**-*en* | **MLS**-*it* | **VoxPopuli**-*en* | **VoxPopuli**-*it* | **AVG**-*en* | **AVG**-*it* |
|
130 |
|-----------------------------------------|---------|---------|---------|---------|---------|----------|---------|----------|
|
|
|
138 |
| FAMA *small* | 13.7 | 8.6 | 5.8 | 12.8 | 7.3 | **15.6** | 8.9 | 12.3 |
|
139 |
| FAMA *medium* | 11.5 | 7.0 | 5.2 | 13.9 | 7.2 | 15.9 | 8.0 | 12.3 |
|
140 |
|
|
|
141 |
### Computational Time and Maximum Batch Size
|
142 |
|
143 |
| ***Model*** | ***Batch Size*** | ***xRTF en (β)*** | ***xRTF it (β)*** | ***xRTF AVG (β)*** |
|
|
|
149 |
| FAMA *small* | 16 | **57.4** | **56.0** | **56.7** |
|
150 |
| FAMA *medium* | 8 | 39.5 | 41.2 | 40.4 |
|
151 |
|
|
|
152 |
## License
|
153 |
|
154 |
We release the FAMA model weights, and training data under the CC-BY 4.0 license.
|