Model Card for Model ID
This modelcard aims to be a base template for new models. It has been generated using this raw template.
Model Details
The model has 31,536,128 trainable parameters
Model Description
Model trained using Early Exit architecture: 12 conformer layers, 6 CTC decoders. The model has been generated by averaging from epoch 60 to epoch 90.
Uses
To be used for ASR: code for using the model available at https://github.com/SpeechTechLab/early-exit-transformer
How to Get Started with the Model
Use the code at https://github.com/SpeechTechLab/early-exit-transformer
Training Details
decoder_mode='ctc', model_type='early_conformer', bpe=True
distill=False, language_model=None, language_model_dict=None, avg_model_start=60, avg_model_end=90
max_len=2000, d_model=256, n_enc_layers_per_exit=2, n_enc_exits=6, n_dec_layers=6, n_heads=8
d_feed_forward=2048, depthwise_kernel_size=31, max_utterance_length=600, sample_rate=16000
n_fft=512, win_length=320, hop_length=160, n_mels=80
src_pad_idx=0, trg_pad_idx=126, trg_sos_idx=1, trg_eos_idx=2, enc_voc_size=256, dec_voc_size=256
sp=<sentencepiece.SentencePieceProcessor=;'aiXpa_en.bpe-256.model' lexicon='aiXpa_en-bpe-256.lex', tokens='aiXpa_en-bpe-256.tok')
Training Data
LibriSpeech, Voxpopuli, TEDLIUM release 3
Training Procedure
From scratch
Preprocessing [optional]
[More Information Needed]
Training Hyperparameters
shuffle=True, batch_size=64, n_batch_split=8, drop_prob=0.1, init_lr=1e-05, adam_eps=1e-09, weight_decay=0.0001, warmup=[trining dataset size], clip=1.0
Speeds, Sizes, Times [optional]
[More Information Needed]
Evaluation (%WER)
test-clean | Voxpopuli | TEDLIUM |
---|---|---|
6.73 | 13.12 | 11.97 |
Testing Data, Factors & Metrics
Testing Data
[More Information Needed]
Factors
[More Information Needed]
Metrics
[More Information Needed]
Results
[More Information Needed]
Summary
Model Examination [optional]
[More Information Needed]
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: [More Information Needed]
- Hours used: [More Information Needed]
- Cloud Provider: [More Information Needed]
- Compute Region: [More Information Needed]
- Carbon Emitted: [More Information Needed]
Technical Specifications [optional]
Model Architecture and Objective
[More Information Needed]
Compute Infrastructure
FBK - digis cluster
Hardware
device=device(type='cuda', index=0, CUDA Version: 12.5) GPU quadro RTX50000
Software
[More Information Needed]
Citation [optional]
G. A. Wright, U. Cappellazzo, S. Zaiem, D. Raj, L. O. Yang, D. Falavigna, M. N. Ali, and A. Brutti, “Training early-exit architectures for automatic speech recognition: Fine-tuning pre-trained models or training from scratch,” in Proc. of ICASSP Workshops, 2024, pp. 685–689.
BibTeX:
[More Information Needed]
APA:
[More Information Needed]
Glossary [optional]
[More Information Needed]
More Information [optional]
[More Information Needed]
Model Card Authors [optional]
[More Information Needed]
Model Card Contact
[More Information Needed]