Yan Shu's picture

Yan Shu

sy1998

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 1 day ago

When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding

commented on a paper 1 day ago

When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding

updated a collection 5 days ago

View all activity

Organizations

sy1998's activity

upvoted a paper 1 day ago

When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding

Paper • 2506.05551 • Published 5 days ago • 3

upvoted a collection 5 days ago

EarthMind

The model, training, and evaluation data of EarthMind. • 4 items • Updated 5 days ago • 1

upvoted a paper 7 days ago

EarthMind: Towards Multi-Granular and Multi-Sensor Earth Observation with Large Multimodal Models

Paper • 2506.01667 • Published 8 days ago • 21

upvoted a paper 11 days ago

VidText: Towards Comprehensive Evaluation for Video Text Understanding

Paper • 2505.22810 • Published 13 days ago • 20

upvoted 4 papers 22 days ago

Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding

Paper • 2409.14485 • Published Sep 22, 2024 • 2

TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control

Paper • 2410.10133 • Published Oct 14, 2024 • 1

Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding

Paper • 2503.18478 • Published Mar 24 • 1

MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding

Paper • 2406.04264 • Published Jun 6, 2024 • 2

upvoted a paper 6 months ago

MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval

Paper • 2412.14475 • Published Dec 19, 2024 • 55

upvoted a collection 8 months ago

Video-XL

5 items • Updated 22 days ago • 2