Simon DL's picture

33 10

Simon DL

SimonDL

·

Simon-dl

AI & ML interests

Reinforcement Learning

Organizations

None yet

SimonDL's activity

upvoted an article 4 months ago

Article

π0 and π0-FAST: Vision-Language-Action Models for General Robot Control

By

and 3 others •

Feb 4

• 161

upvoted an article 5 months ago

Article

The Large Language Model Course

By

•

Jan 16

• 187

upvoted 9 papers 7 months ago

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Paper • 2411.17465 • Published Nov 26, 2024 • 88

O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

Paper • 2411.16489 • Published Nov 25, 2024 • 49

BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

Paper • 2411.13543 • Published Nov 20, 2024 • 18

RedPajama: an Open Dataset for Training Large Language Models

Paper • 2411.12372 • Published Nov 19, 2024 • 56

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published Nov 15, 2024 • 125

Sharingan: Extract User Action Sequence from Desktop Recordings

Paper • 2411.08768 • Published Nov 13, 2024 • 10

Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination

Paper • 2411.03823 • Published Nov 6, 2024 • 50

Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning

Paper • 2410.21845 • Published Oct 29, 2024 • 14

Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset

Paper • 2410.22325 • Published Oct 29, 2024 • 10

upvoted 9 papers 8 months ago

SMITE: Segment Me In TimE

Paper • 2410.18538 • Published Oct 24, 2024 • 16

MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark

Paper • 2410.19168 • Published Oct 24, 2024 • 20

ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting

Paper • 2410.17856 • Published Oct 23, 2024 • 52

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

Paper • 2410.17247 • Published Oct 22, 2024 • 48

JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation

Paper • 2410.17250 • Published Oct 22, 2024 • 15

Agent-to-Sim: Learning Interactive Behavior Models from Casual Longitudinal Videos

Paper • 2410.16259 • Published Oct 21, 2024 • 5

Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities

Paper • 2410.11190 • Published Oct 15, 2024 • 22

Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation

Paper • 2410.13232 • Published Oct 17, 2024 • 45

Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies

Paper • 2410.10803 • Published Oct 14, 2024 • 7