InstantDrag: Improving Interactivity in Drag-based Image Editing Paper • 2409.08857 • Published 9 days ago • 28
Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering Paper • 2409.07441 • Published 11 days ago • 9
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models Paper • 2409.07452 • Published 11 days ago • 18
Gradio Spaces for Background Removal Collection Enhance your images by removing the background. Will ensure these Spaces are up and maintained for the community. • 5 items • Updated Aug 20 • 23
Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics Paper • 2408.04631 • Published Aug 8 • 8
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8 • 152
view article Article ColPali: Efficient Document Retrieval with Vision Language Models 👀 By manu • Jul 5 • 87
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively Paper • 2401.02955 • Published Jan 5 • 19
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving Paper • 2404.16771 • Published Apr 25 • 16
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings Paper • 2404.16820 • Published Apr 25 • 15
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback Paper • 2404.07987 • Published Apr 11 • 47
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation Paper • 2404.02733 • Published Apr 3 • 20
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching Paper • 2404.03653 • Published Apr 4 • 32
LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models Paper • 2404.03118 • Published Apr 3 • 23
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens Paper • 2404.03413 • Published Apr 4 • 25
AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks Paper • 2403.14468 • Published Mar 21 • 21
RadSplat: Radiance Field-Informed Gaussian Splatting for Robust Real-Time Rendering with 900+ FPS Paper • 2403.13806 • Published Mar 20 • 18
Learning Generalizable Feature Fields for Mobile Manipulation Paper • 2403.07563 • Published Mar 12 • 6
Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM Paper • 2403.07487 • Published Mar 12 • 13
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on Paper • 2403.01779 • Published Mar 4 • 26
Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation Paper • 2402.10491 • Published Feb 16 • 16
Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with Prototypical Embedding Paper • 2401.15708 • Published Jan 28 • 10
PALP: Prompt Aligned Personalization of Text-to-Image Models Paper • 2401.06105 • Published Jan 11 • 46
Pearl: A Production-ready Reinforcement Learning Agent Paper • 2312.03814 • Published Dec 6, 2023 • 14
DiffiT: Diffusion Vision Transformers for Image Generation Paper • 2312.02139 • Published Dec 4, 2023 • 13
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models Paper • 2312.02949 • Published Dec 5, 2023 • 11
Generating Fine-Grained Human Motions Using ChatGPT-Refined Descriptions Paper • 2312.02772 • Published Dec 5, 2023 • 6
Alchemist: Parametric Control of Material Properties with Diffusion Models Paper • 2312.02970 • Published Dec 5, 2023 • 7
Rank-without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models Paper • 2312.02969 • Published Dec 5, 2023 • 12
Fine-grained Controllable Video Generation via Object Appearance and Context Paper • 2312.02919 • Published Dec 5, 2023 • 10
LivePhoto: Real Image Animation with Text-guided Motion Control Paper • 2312.02928 • Published Dec 5, 2023 • 16