ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet Paper • 2111.14706 • Published Nov 29, 2021
On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models Paper • 2406.09282 • Published Jun 13, 2024
OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models Paper • 2502.10373 • Published Feb 14
Granary: Speech Recognition and Translation Dataset in 25 European Languages Paper • 2505.13404 • Published 22 days ago
OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning Paper • 2506.00338 • Published 11 days ago • 8
The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System Paper • 2310.12378 • Published Oct 18, 2023
TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context Paper • 2110.04410 • Published Oct 8, 2021
Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach Paper • 2309.05248 • Published Sep 11, 2023
Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations Paper • 2407.03495 • Published Jul 3, 2024 • 1
Spectral Codecs: Spectrogram-Based Audio Codecs for High Quality Speech Synthesis Paper • 2406.05298 • Published Jun 7, 2024
Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens Paper • 2409.06656 • Published Sep 10, 2024
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks Paper • 2408.13106 • Published Aug 23, 2024 • 1
Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation Paper • 2310.12371 • Published Oct 18, 2023
Less is More: Accurate Speech Recognition & Translation without Web-Scale Data Paper • 2406.19674 • Published Jun 28, 2024
Training and Inference Efficiency of Encoder-Decoder Speech Models Paper • 2503.05931 • Published Mar 7 • 3
ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration Paper • 2409.09506 • Published Sep 14, 2024 • 4
Towards Robust Speech Representation Learning for Thousands of Languages Paper • 2407.00837 • Published Jun 30, 2024 • 11
Towards Robust Speech Representation Learning for Thousands of Languages Paper • 2407.00837 • Published Jun 30, 2024 • 11
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification Paper • 2402.12654 • Published Feb 20, 2024 • 1
E-Branchformer: Branchformer with Enhanced merging for speech recognition Paper • 2210.00077 • Published Sep 30, 2022 • 2