Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features Paper • 2504.00557 • Published 20 days ago • 15
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning Paper • 2401.17690 • Published Jan 31, 2024 • 5
EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance Paper • 2409.01201 • Published Sep 2, 2024 • 1