ERNIE 4.5 Collection collection of ERNIE 4.5 models. "-Paddle" models use PaddlePaddle weights, while "-PT" models use Transformer-style PyTorch weights. • 25 items • Updated Jul 11 • 159
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated Jul 21 • 531
Phi-4 Collection Phi-4 family of small language, multi-modal and reasoning models. • 17 items • Updated Jul 10 • 178
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published Feb 16 • 165
Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video Paper • 2404.09833 • Published Apr 15, 2024 • 31
Masked Audio Generation using a Single Non-Autoregressive Transformer Paper • 2401.04577 • Published Jan 9, 2024 • 44
DocLLM: A layout-aware generative language model for multimodal document understanding Paper • 2401.00908 • Published Dec 31, 2023 • 189
DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision Paper • 2312.16256 • Published Dec 26, 2023 • 18
PlatoNeRF: 3D Reconstruction in Plato's Cave via Single-View Two-Bounce Lidar Paper • 2312.14239 • Published Dec 21, 2023 • 12
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit Paper • 2312.09911 • Published Dec 15, 2023 • 55