SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame Interpolation Paper • 2308.16876 • Published Aug 31, 2023 • 9
MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation Paper • 2505.10238 • Published May 15 • 10
VGGHeads: A Large-Scale Synthetic Dataset for 3D Human Heads Paper • 2407.18245 • Published Jul 25, 2024 • 11
Evaluating Multiview Object Consistency in Humans and Image Models Paper • 2409.05862 • Published Sep 9, 2024 • 11
MaGGIe: Masked Guided Gradual Human Instance Matting Paper • 2404.16035 • Published Apr 24, 2024 • 12
UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing Paper • 2411.16781 • Published Nov 25, 2024 • 12
MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm Paper • 2502.02358 • Published Feb 4 • 19
HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation Paper • 2407.17438 • Published Jul 24, 2024 • 27
Whole-Body Conditioned Egocentric Video Prediction Paper • 2506.21552 • Published 3 days ago • 6
FaSTA^*: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing Paper • 2506.20911 • Published 3 days ago • 37
HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling Paper • 2506.20452 • Published 4 days ago • 12
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding Paper • 2506.16035 • Published 10 days ago • 79
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights Paper • 2506.16406 • Published 10 days ago • 108
All is Not Lost: LLM Recovery without Checkpoints Paper • 2506.15461 • Published 11 days ago • 35
view article Article (LoRA) Fine-Tuning FLUX.1-dev on Consumer Hardware By derekl35 and 4 others • 10 days ago • 66
Universal Jailbreak Suffixes Are Strong Attention Hijackers Paper • 2506.12880 • Published 14 days ago • 5
TR2M: Transferring Monocular Relative Depth to Metric Depth with Language Descriptions and Scale-Oriented Contrast Paper • 2506.13387 • Published 13 days ago • 3
Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model Paper • 2506.13642 • Published 13 days ago • 26
LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning Paper • 2506.10082 • Published 18 days ago • 8