Common Pile v0.1 Filtered Data Collection An LLM pre-training dataset produced by filtering and deduplicating the raw text collected in the Common Pile v0.1 • 31 items • Updated Jun 6 • 13
Efficient Part-level 3D Object Generation via Dual Volume Packing Paper • 2506.09980 • Published Jun 11 • 8
Voila Collection Voila: Voice-Language Foundation Models. https://voila.maitrix.org • 7 items • Updated May 6 • 23
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play Paper • 2505.02707 • Published May 5 • 84
PixelHacker: Image Inpainting with Structural and Semantic Consistency Paper • 2504.20438 • Published Apr 29 • 44
Orpheus Multilingual Research Release Collection Beta Release of multilingual models. • 12 items • Updated Apr 10 • 92
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes Paper • 2503.23461 • Published Mar 30 • 95
TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding Paper • 2502.19400 • Published Feb 26 • 49
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation Paper • 2502.18364 • Published Feb 25 • 37
MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use Paper • 2502.15872 • Published Feb 21 • 5
view article Article Open-source DeepResearch – Freeing our search agents By m-ric and 4 others • Feb 4 • 1.27k
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated Apr 28 • 502
FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces Paper • 2501.12909 • Published Jan 22 • 71
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding Paper • 2501.12380 • Published Jan 21 • 86
HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution Paper • 2501.10045 • Published Jan 17 • 9
SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces Paper • 2501.09756 • Published Jan 16 • 19