19 29 16

Xiangtai Li

LXT

https://lxtgh.github.io/

AI & ML interests

Computer Vision, Multi-Modal Understanding, Generative AI

Recent Activity

authored a paper 18 days ago

Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models

authored a paper 18 days ago

UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions

authored a paper 18 days ago

Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology

View all activity

Organizations

upvoted a paper 18 days ago

Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology

Paper • 2507.07999 • Published 18 days ago • 45

upvoted 3 papers 26 days ago

SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

Paper • 2506.24119 • Published 28 days ago • 46

Ovis-U1 Technical Report

Paper • 2506.23044 • Published about 1 month ago • 60

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published 27 days ago • 200

upvoted a paper 28 days ago

VMoBA: Mixture-of-Block Attention for Video Diffusion Models

Paper • 2506.23858 • Published 29 days ago • 30

upvoted a paper about 2 months ago

CyberV: Cybernetics for Test-time Scaling in Video Understanding

Paper • 2506.07971 • Published Jun 9 • 4

upvoted 4 papers 3 months ago

On Path to Multimodal Generalist: General-Level and General-Bench

Paper • 2505.04620 • Published May 7 • 82

PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild

Paper • 2504.11326 • Published Apr 15 • 6

The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer

Paper • 2504.10462 • Published Apr 14 • 15

Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding

Paper • 2504.10465 • Published Apr 14 • 27

upvoted 2 papers 4 months ago

An Empirical Study of GPT-4o Image Generation Capabilities

Paper • 2504.05979 • Published Apr 8 • 63

OMG-Seg: Is One Model Good Enough For All Segmentation?

Paper • 2401.10229 • Published Jan 18, 2024 • 1

upvoted an article 5 months ago

Article

Finally, a Replacement for BERT: Introducing ModernBERT

and 14 others •

Dec 19, 2024

• 670

upvoted a paper 7 months ago

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Paper • 2501.04001 • Published Jan 7 • 47

upvoted 4 papers 8 months ago

Multimodal Latent Language Modeling with Next-Token Diffusion

Paper • 2412.08635 • Published Dec 11, 2024 • 46

EMOv2: Pushing 5M Vision Model Frontier

Paper • 2412.06674 • Published Dec 9, 2024 • 13

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Paper • 2412.07589 • Published Dec 10, 2024 • 49

HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing

Paper • 2412.04280 • Published Dec 5, 2024 • 14

upvoted 2 papers 9 months ago

Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

Paper • 2410.13848 • Published Oct 17, 2024 • 35

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17, 2024 • 99

Xiangtai Li

AI & ML interests

Recent Activity

Organizations

LXT's activity

Finally, a Replacement for BERT: Introducing ModernBERT