Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos Paper • 2507.15597 • Published Jul 21 • 33
Unified Multimodal Understanding via Byte-Pair Visual Encoding Paper • 2506.23639 • Published Jun 30 • 3
EfficientLLM: Scalable Pruning-Aware Pretraining for Architecture-Agnostic Edge Language Models Paper • 2502.06663 • Published Feb 10 • 2
Entity Divider with Language Grounding in Multi-Agent Reinforcement Learning Paper • 2210.13942 • Published Oct 25, 2022 • 1
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities Paper • 2410.02155 • Published Oct 3, 2024 • 3
RLAdapter: Bridging Large Language Models to Reinforcement Learning in Open Worlds Paper • 2309.17176 • Published Sep 29, 2023 • 2
Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills Paper • 2503.12533 • Published Mar 16 • 69