-
Unified Vision-Language-Action Model
Paper • 2506.19850 • Published • 21 -
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Paper • 2506.01844 • Published • 104 -
3D-VLA: A 3D Vision-Language-Action Generative World Model
Paper • 2403.09631 • Published • 10 -
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots
Paper • 2312.14457 • Published • 1
Collections
Discover the best community collections!
Collections including paper arxiv:2506.19850
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 10.2k • 1.1k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 12 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 1.81k • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 60
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 59 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 53 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 43 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 62
-
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 131 -
ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development
Paper • 2506.05010 • Published • 69 -
SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training
Paper • 2506.05301 • Published • 55 -
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning
Paper • 2505.16933 • Published • 32
-
Gemini Robotics: Bringing AI into the Physical World
Paper • 2503.20020 • Published • 27 -
Magma: A Foundation Model for Multimodal AI Agents
Paper • 2502.13130 • Published • 58 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 51
-
FLAME: Factuality-Aware Alignment for Large Language Models
Paper • 2405.01525 • Published • 29 -
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Paper • 2405.14333 • Published • 41 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 55 -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Paper • 2405.18991 • Published • 12
-
Unified Vision-Language-Action Model
Paper • 2506.19850 • Published • 21 -
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Paper • 2506.01844 • Published • 104 -
3D-VLA: A 3D Vision-Language-Action Generative World Model
Paper • 2403.09631 • Published • 10 -
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots
Paper • 2312.14457 • Published • 1
-
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 131 -
ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development
Paper • 2506.05010 • Published • 69 -
SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training
Paper • 2506.05301 • Published • 55 -
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning
Paper • 2505.16933 • Published • 32
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 10.2k • 1.1k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 12 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 1.81k • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 60
-
Gemini Robotics: Bringing AI into the Physical World
Paper • 2503.20020 • Published • 27 -
Magma: A Foundation Model for Multimodal AI Agents
Paper • 2502.13130 • Published • 58 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 51
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 59 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 53 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 43 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 62
-
FLAME: Factuality-Aware Alignment for Large Language Models
Paper • 2405.01525 • Published • 29 -
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Paper • 2405.14333 • Published • 41 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 55 -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Paper • 2405.18991 • Published • 12