20 52 20

Xiangtai Li

LXT

https://lxtgh.github.io/

AI & ML interests

Computer Vision, Multi-Modal Understanding, Generative AI

Recent Activity

upvoted a paper 6 days ago

SAMTok: Representing Any Mask with Two Words

submitted a paper 6 days ago

SAMTok: Representing Any Mask with Two Words

upvoted a paper 13 days ago

Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding

View all activity

Organizations

upvoted a paper 6 days ago

SAMTok: Representing Any Mask with Two Words

Paper • 2601.16093 • Published 7 days ago • 40

submitted a paper to Daily Papers 6 days ago

SAMTok: Representing Any Mask with Two Words

Paper • 2601.16093 • Published 7 days ago • 40

upvoted 2 papers 13 days ago

Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding

Paper • 2601.10611 • Published 14 days ago • 26

STEP3-VL-10B Technical Report

Paper • 2601.09668 • Published 15 days ago • 188

upvoted 2 papers 16 days ago

BabyVision: Visual Reasoning Beyond Language

Paper • 2601.06521 • Published 19 days ago • 191

Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning

Paper • 2601.06943 • Published 18 days ago • 207

upvoted 2 papers about 1 month ago

Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future

Paper • 2512.16760 • Published Dec 18, 2025 • 14

LLaDA2.0: Scaling Up Diffusion Language Models to 100B

Paper • 2512.15745 • Published Dec 10, 2025 • 81

liked a model about 1 month ago

WeiChow/EditMGT

Image-to-Image • Updated Dec 20, 2025 • 8

upvoted a paper about 1 month ago

RecTok: Reconstruction Distillation along Rectified Flow

Paper • 2512.13421 • Published Dec 15, 2025 • 5

authored 10 papers about 2 months ago

DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World

Paper • 2506.24102 • Published Jun 30, 2025

One Flight Over the Gap: A Survey from Perspective to Panoramic Vision

Paper • 2509.04444 • Published Sep 4, 2025

VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models

Paper • 2508.12081 • Published Aug 16, 2025

DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training

Paper • 2510.11712 • Published Oct 13, 2025 • 31

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

Paper • 2510.18876 • Published Oct 21, 2025 • 37

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark

Paper • 2510.26802 • Published Oct 30, 2025 • 34

Visual Spatial Tuning

Paper • 2511.05491 • Published Nov 7, 2025 • 52

Xiangtai Li

AI & ML interests

Recent Activity

Organizations

LXT's activity