Audio-Visual Glance Network for Efficient Video Recognition Paper • 2308.09322 • Published Aug 18, 2023
Modality Mixer Exploiting Complementary Information for Multi-modal Action Recognition Paper • 2311.12344 • Published Nov 21, 2023 • 2
HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D Paper • 2312.15980 • Published Dec 26, 2023 • 13
Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts Paper • 2403.09176 • Published Mar 14, 2024 • 2
Tackling the Challenges in Scene Graph Generation with Local-to-Global Interactions Paper • 2106.08543 • Published Jun 16, 2021
Explore-And-Match: Bridging Proposal-Based and Proposal-Free With Transformer for Sentence Grounding in Videos Paper • 2201.10168 • Published Jan 25, 2022
Temporal Flow Mask Attention for Open-Set Long-Tailed Recognition of Wild Animals in Camera-Trap Images Paper • 2208.14625 • Published Aug 31, 2022
What and When to Look?: Temporal Span Proposal Network for Video Relation Detection Paper • 2107.07154 • Published Jul 15, 2021
Towards Good Practices for Missing Modality Robust Action Recognition Paper • 2211.13916 • Published Nov 25, 2022
Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models Paper • 2405.17820 • Published May 28, 2024
RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in LVLMs Paper • 2405.17821 • Published May 28, 2024