CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition Paper • 2412.12760 • Published Dec 17, 2024
ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark Paper • 2507.05727 • Published 5 days ago
Enhancing Lip Reading with Multi-Scale Video and Multi-Encoder Paper • 2404.05466 • Published Apr 8, 2024
The NPU-ASLP System Description for Visual Speech Recognition in CNVSRC 2024 Paper • 2408.02369 • Published Aug 5, 2024
MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition Paper • 2401.03424 • Published Jan 7, 2024
The NPU-ASLP-LiAuto System Description for Visual Speech Recognition in CNVSRC 2023 Paper • 2401.06788 • Published Jan 7, 2024
ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge Paper • 2401.03473 • Published Jan 7, 2024
The NPU-ASLP System for Audio-Visual Speech Recognition in MISP 2022 Challenge Paper • 2303.06341 • Published Mar 11, 2023
VE-KWS: Visual Modality Enhanced End-to-End Keyword Spotting Paper • 2302.13523 • Published Feb 27, 2023
AISHELL-5: The First Open-Source In-Car Multi-Channel Multi-Speaker Speech Dataset for Automatic Speech Diarization and Recognition Paper • 2505.23036 • Published May 29