Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs Paper âĸ 2411.02256 âĸ Published Nov 4, 2024 âĸ 1
AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset Paper âĸ 2311.15308 âĸ Published Nov 26, 2023 âĸ 1
Sleeping 5 5 Gradio Demo Space creation helper V2 đļ Generate Gradio demo files for Hugging Face model repos
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper âĸ 2409.17146 âĸ Published Sep 25, 2024 âĸ 106
Running on CPU Upgrade 609 609 Open ASR Leaderboard đ Request evaluation results for a speech model
Running on A10G 299 299 AudioLDM2 Text2Audio Text2Music Generation đ Generate a video waveform from text-based audio descriptions
Running on CPU Upgrade 9.32k 9.32k AI Comic Factory đŠ Create your own AI comic with a single prompt