CHOrD: Generation of Collision-Free, House-Scale, and Organized Digital Twins for 3D Indoor Scenes with Controllable Floor Plans and Optimal Layouts Paper • 2503.11958 • Published 13 days ago • 3
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity Paper • 2503.07677 • Published 18 days ago • 81
BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing Paper • 2503.13434 • Published 10 days ago • 24
Personalize Anything for Free with Diffusion Transformer Paper • 2503.12590 • Published 11 days ago • 41
Pensez: Less Data, Better Reasoning -- Rethinking French LLM Paper • 2503.13661 • Published 10 days ago • 5
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection Paper • 2503.12271 • Published 12 days ago • 9
Florenz: Scaling Laws for Systematic Generalization in Vision-Language Models Paper • 2503.09443 • Published 15 days ago • 7
FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis Paper • 2503.13265 • Published 10 days ago • 15
Infinite Mobility: Scalable High-Fidelity Synthesis of Articulated Objects via Procedural Generation Paper • 2503.13424 • Published 10 days ago • 26
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era Paper • 2503.12329 • Published 12 days ago • 24
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion Paper • 2503.11576 • Published 13 days ago • 75
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video Paper • 2503.11647 • Published 13 days ago • 123