Tony Zhao's picture

Tony Zhao

tianchez

·

https://www.tianchez.com

AI & ML interests

Multimodal Agent, Generative AI

Recent Activity

commented on a paper about 2 months ago

VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model

updated a model about 2 months ago

omlab/VLM-R1-Qwen2.5VL-3B-Math-0305

updated a model about 2 months ago

omlab/Qwen2.5VL-3B-VLM-R1-REC-500steps

View all activity

Organizations

tianchez's activity

commented a paper about 2 months ago

VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model

Paper • 2504.07615 • Published Apr 10 • 32 •

updated 3 models about 2 months ago

omlab/VLM-R1-Qwen2.5VL-3B-Math-0305

Visual Question Answering • Updated Apr 14 • 3.05k • 3

omlab/Qwen2.5VL-3B-VLM-R1-REC-500steps

Zero-Shot Object Detection • Updated Apr 14 • 1.54k • 22

omlab/VLM-R1-Qwen2.5VL-3B-OVD-0321

Zero-Shot Object Detection • Updated Apr 14 • 1.22k • 14

updated a collection about 2 months ago

Multimodal Research

10 items • Updated Apr 14 • 2

updated a Space about 2 months ago

VLM R1 Referral Expression

Mark regions in images based on text descriptions

upvoted a paper about 2 months ago

VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model

Paper • 2504.07615 • Published Apr 10 • 32

upvoted 2 articles 3 months ago

Article

Improving Object Detection through Reinforcement Learning with VLM-R1

By

and 5 others •

Mar 25

• 2

Article

Trials, Errors, and Breakthroughs: Our Rocky Road to OVD SOTA with Reinforcement Learning

By

and 5 others •

Mar 25

• 1

replied to AdinaY's post 3 months ago

https://huggingface.co/blog/omlab/vlm-r1-for-ovd
https://huggingface.co/blog/omlab/vlm-ovd-findings

replied to AdinaY's post 3 months ago

We now share our latest insights in the blog here.
https://om-ai-lab.github.io/index.html

liked 2 Spaces 3 months ago

OmAgent

Process and answer questions about webpage videos

VLM R1 OVD

VLM-R1 model for Open-Vocabulary Object Detection

published a Space 3 months ago

VLM R1 OVD

VLM-R1 model for Open-Vocabulary Object Detection

upvoted a collection 3 months ago

VLM-R1-models

A collection of VLM-R1 Models • 7 items • Updated Mar 22 • 4

New activity in omlab/VLM-R1-Referral-Expression 3 months ago

Apply for community grant: Personal project (gpu)

#3 opened 3 months ago by

replied to their post 4 months ago

looks very cool!

reacted to their post with 👍 4 months ago

Post

4438

Introducing VLM-R1!

GRPO has helped DeepSeek R1 to learn reasoning. Can it also help VLMs perform stronger for general computer vision tasks?

The answer is YES and it generalizes better than SFT. We trained Qwen 2.5 VL 3B on RefCOCO (a visual grounding task) and eval on RefCOCO Val and RefGTA (an OOD task).

https://github.com/om-ai-lab/VLM-R1

3 replies

·

New activity in omlab/VLM-R1-Referral-Expression 4 months ago

Fixes 500 error for some users

#1 opened 4 months ago by

reacted to their post with ❤️ 4 months ago

Post

4438

Introducing VLM-R1!

GRPO has helped DeepSeek R1 to learn reasoning. Can it also help VLMs perform stronger for general computer vision tasks?

The answer is YES and it generalizes better than SFT. We trained Qwen 2.5 VL 3B on RefCOCO (a visual grounding task) and eval on RefCOCO Val and RefGTA (an OOD task).

https://github.com/om-ai-lab/VLM-R1

3 replies

·