Learning Adaptive Parallel Reasoning with Language Models Paper • 2504.15466 • Published 16 days ago • 42
Describe Anything: Detailed Localized Image and Video Captioning Paper • 2504.16072 • Published 15 days ago • 60
Describe Anything: Detailed Localized Image and Video Captioning Paper • 2504.16072 • Published 15 days ago • 60
Unified Visual Relationship Detection with Vision and Language Models Paper • 2303.08998 • Published Mar 16, 2023
The iNaturalist Species Classification and Detection Dataset Paper • 1707.06642 • Published Jul 20, 2017
Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception Paper • 2305.06324 • Published May 10, 2023 • 1
Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Paper • 2004.12276 • Published Apr 26, 2020 • 1
Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation Paper • 2012.07177 • Published Dec 13, 2020
DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model Paper • 2306.01736 • Published Jun 2, 2023 • 1
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation Paper • 2104.13921 • Published Apr 28, 2021
VideoGLUE: Video General Understanding Evaluation of Foundation Models Paper • 2307.03166 • Published Jul 6, 2023 • 5
A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models Paper • 2302.06235 • Published Feb 13, 2023
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models Paper • 2411.07126 • Published Nov 11, 2024 • 31
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning Paper • 2503.15558 • Published Mar 18 • 46
Atlas: Multi-Scale Attention Improves Long Context Image Modeling Paper • 2503.12355 • Published Mar 16 • 11
Wolf: Captioning Everything with a World Summarization Framework Paper • 2407.18908 • Published Jul 26, 2024 • 33
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation Paper • 2404.19752 • Published Apr 30, 2024 • 25