Om AI Lab

Team

company

https://github.com/om-ai-lab

OmAI_lab

om-ai-lab

Activity Feed

AI & ML interests

Multimodal AI, VLM, VLA, VAM, etc

Recent Activity

tianchez authored a paper 18 days ago

VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model

tianchez authored a paper 18 days ago

Evaluating and Enhancing LLMs for Multi-turn Text-to-SQL with Multiple Question Types

tianchez authored a paper 18 days ago

ImageRAG: Enhancing Ultra High Resolution Remote Sensing Imagery Analysis with ImageRAG

View all activity

Papers

Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models

VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs

View all Papers

Articles

VLX-Go: Vision-Language Short-Horizon Waypoint Prediction for Embodied Navigation

19 days ago

• 12

VLX-Seek: Improving VLM Fine-Grained Perception via Region Reference Instead of Coordinate Generation

20 days ago

• 14

VLX-Flow: Continuous Video Understanding for Real-Time Multimodal Interaction

20 days ago

• 15

Trials, Errors, and Breakthroughs: Our Rocky Road to OVD SOTA with Reinforcement Learning

Mar 25, 2025

• 3

Improving Object Detection through Reinforcement Learning with VLM-R1

Mar 25, 2025

• 4

View all articles

Organization Card

Community About org cards

Om AI Lab is a passionate group building multimodal AI agents that reshape our work and life.

Collections 5

View 5 collections

spaces 5

Open Agent Leaderboard

🥇

Open Agent Leaderboard

VLM R1 Referral Expression

💬

Mark regions in images based on text descriptions

OmAgent

💬

Process and answer questions about webpage videos

VLM R1 OVD

👁

VLM-R1 model for Open-Vocabulary Object Detection

models 9

datasets 12

omlab/SARDet_REC6_NORM-FS

Viewer • Updated Feb 4 • 968 • 32

omlab/SARDet_REC6-FS

Viewer • Updated Feb 4 • 968 • 14

omlab/SARDet3-FS

Viewer • Updated Feb 1 • 270 • 15

omlab/Cross_DIOR-RSVG

Viewer • Updated Oct 2, 2025 • 7.42k • 12

omlab/Cross_RRSIS-D

Viewer • Updated Oct 2, 2025 • 3.48k • 44

omlab/VRSBench-FS

Viewer • Updated Oct 2, 2025 • 16.6k • 35 • 1

omlab/NWPU-FS

Viewer • Updated Oct 2, 2025 • 39 • 13

omlab/EarthReason-FS

Viewer • Updated Oct 2, 2025 • 3.39k • 63 • 1

omlab/VLM-R1

Preview • Updated Apr 23, 2025 • 383 • 18

omlab/RS5M

Viewer • Updated Mar 16, 2025 • 7.25M • 187 • 1

View 12 datasets

AI & ML interests

Recent Activity

Papers

Articles

VLX-Go: Vision-Language Short-Horizon Waypoint Prediction for Embodied Navigation

VLX-Seek: Improving VLM Fine-Grained Perception via Region Reference Instead of Coordinate Generation

VLX-Flow: Continuous Video Understanding for Real-Time Multimodal Interaction

Trials, Errors, and Breakthroughs: Our Rocky Road to OVD SOTA with Reinforcement Learning

Improving Object Detection through Reinforcement Learning with VLM-R1

Team members 3

Collections 5

spaces 5 Sort: Recently updated

Open Agent Leaderboard

VLM R1 Referral Expression

OmAgent

VLM R1 OVD

models 9 Sort: Recently updated

datasets 12 Sort: Recently updated

spaces 5

models 9

datasets 12