Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code Paper • 2310.01506 • Published Oct 2, 2023
RL-GPT: Integrating Reinforcement Learning and Code-as-policy Paper • 2402.19299 • Published Feb 29, 2024
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Paper • 2403.18814 • Published Mar 27, 2024 • 47
Multi-modal Cooking Workflow Construction for Food Recipes Paper • 2008.09151 • Published Aug 20, 2020 • 1
VisionZip: Longer is Better but Not Necessary in Vision Language Models Paper • 2412.04467 • Published Dec 5, 2024 • 105