PEEK: Picking Essential frames via Efficient Knowledge distillation Paper • 2605.31029 • Published 5 days ago • 17
Multimodal Chaptering for Long-Form TV Newscast Video Paper • 2406.17590 • Published Mar 20, 2024 • 2
Disability Representations: Finding Biases in Automatic Image Generation Paper • 2406.14993 • Published Jun 21, 2024 • 1
Towards Retrieval Augmented Generation over Large Video Libraries Paper • 2406.14938 • Published Jun 21, 2024 • 22
Inserting Faces inside Captions: Image Captioning with Attention Guided Merging Paper • 2405.02305 • Published Mar 20, 2024 • 2