Model Card for Model ID

This modelcard aims to be a base template for new models. It has been generated using this raw template.

Model Details

The model has 31,536,128 trainable parameters

Model Description

Model trained using Early Exit architecture: 12 conformer layers, 6 CTC decoders. The model has been generated by averaging from epoch 60 to epoch 90.

Uses

To be used for ASR: code for using the model available at https://github.com/SpeechTechLab/early-exit-transformer

How to Get Started with the Model

Use the code at https://github.com/SpeechTechLab/early-exit-transformer

Training Details

decoder_mode='ctc', model_type='early_conformer', bpe=True

distill=False, language_model=None, language_model_dict=None, avg_model_start=60, avg_model_end=90

max_len=2000, d_model=256, n_enc_layers_per_exit=2, n_enc_exits=6, n_dec_layers=6, n_heads=8

d_feed_forward=2048, depthwise_kernel_size=31, max_utterance_length=600, sample_rate=16000

n_fft=512, win_length=320, hop_length=160, n_mels=80

src_pad_idx=0, trg_pad_idx=126, trg_sos_idx=1, trg_eos_idx=2, enc_voc_size=256, dec_voc_size=256

sp=<sentencepiece.SentencePieceProcessor=;'aiXpa_en.bpe-256.model' lexicon='aiXpa_en-bpe-256.lex', tokens='aiXpa_en-bpe-256.tok')

Training Data

LibriSpeech, Voxpopuli, TEDLIUM release 3

Training Procedure

From scratch

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

shuffle=True, batch_size=64, n_batch_split=8, drop_prob=0.1, init_lr=1e-05, adam_eps=1e-09, weight_decay=0.0001, warmup=[trining dataset size], clip=1.0

Speeds, Sizes, Times [optional]

[More Information Needed]

Evaluation (%WER)

test-clean	Voxpopuli	TEDLIUM
6.73	13.12	11.97

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Factors

[More Information Needed]

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

Model Examination [optional]

[More Information Needed]

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: [More Information Needed]
Hours used: [More Information Needed]
Cloud Provider: [More Information Needed]
Compute Region: [More Information Needed]
Carbon Emitted: [More Information Needed]

Technical Specifications [optional]

Model Architecture and Objective

[More Information Needed]

Compute Infrastructure

FBK - digis cluster

Hardware

device=device(type='cuda', index=0, CUDA Version: 12.5) GPU quadro RTX50000

Software

[More Information Needed]

Citation [optional]

G. A. Wright, U. Cappellazzo, S. Zaiem, D. Raj, L. O. Yang, D. Falavigna, M. N. Ali, and A. Brutti, “Training early-exit architectures for automatic speech recognition: Fine-tuning pre-trained models or training from scratch,” in Proc. of ICASSP Workshops, 2024, pp. 685–689.

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[More Information Needed]