Nvidia Data&Tools team

company

https://www.nvidia.com/en-us/ai-data-science/products/nemo/

https://github.com/NVIDIA/NeMo

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

pyf98 authored a paper 8 days ago

ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet

pyf98 authored a paper 8 days ago

On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models

pyf98 authored a paper 8 days ago

OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models

View all activity

NeMoDataAndTools's activity

pyf98

authored 5 papers 8 days ago

ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet

Paper • 2111.14706 • Published Nov 29, 2021

On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models

Paper • 2406.09282 • Published Jun 13, 2024

OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models

Paper • 2502.10373 • Published Feb 14

Granary: Speech Recognition and Translation Dataset in 25 European Languages

Paper • 2505.13404 • Published 22 days ago

OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning

Paper • 2506.00338 • Published 11 days ago • 8

nithinraok

authored 10 papers 3 months ago

The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System

Paper • 2310.12378 • Published Oct 18, 2023

TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context

Paper • 2110.04410 • Published Oct 8, 2021

Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach

Paper • 2309.05248 • Published Sep 11, 2023

Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations

Paper • 2407.03495 • Published Jul 3, 2024 • 1

Spectral Codecs: Spectrogram-Based Audio Codecs for High Quality Speech Synthesis

Paper • 2406.05298 • Published Jun 7, 2024

Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens

Paper • 2409.06656 • Published Sep 10, 2024

NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks

Paper • 2408.13106 • Published Aug 23, 2024 • 1

Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation

Paper • 2310.12371 • Published Oct 18, 2023

Less is More: Accurate Speech Recognition & Translation without Web-Scale Data

Paper • 2406.19674 • Published Jun 28, 2024

Training and Inference Efficiency of Encoder-Decoder Speech Models

Paper • 2503.05931 • Published Mar 7 • 3

pyf98

authored a paper 9 months ago

ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration

Paper • 2409.09506 • Published Sep 14, 2024 • 4

sw005320

authored a paper 11 months ago

Towards Robust Speech Representation Learning for Thousands of Languages

Paper • 2407.00837 • Published Jun 30, 2024 • 11

pyf98

authored a paper 11 months ago

Towards Robust Speech Representation Learning for Thousands of Languages

Paper • 2407.00837 • Published Jun 30, 2024 • 11

pyf98

authored 2 papers over 1 year ago

OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification

Paper • 2402.12654 • Published Feb 20, 2024 • 1

E-Branchformer: Branchformer with Enhanced merging for speech recognition

Paper • 2210.00077 • Published Sep 30, 2022 • 2

AI & ML interests

Recent Activity

Team members 7

NeMoDataAndTools's activity