Zuhao Yang's picture

Zuhao Yang

mwxely

·

https://mwxely.github.io/

AI & ML interests

Large Multimodal Models

Recent Activity

upvoted a paper 14 days ago

OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence

upvoted a paper 25 days ago

OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models

updated a dataset 29 days ago

mwxely/longvt-parquet-fixed

View all activity

Organizations

upvoted a paper 14 days ago

OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence

Paper • 2602.08683 • Published 20 days ago • 49

upvoted a paper 25 days ago

OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models

Paper • 2602.04804 • Published 25 days ago • 46

upvoted a paper about 1 month ago

XR: Cross-Modal Agents for Composed Image Retrieval

Paper • 2601.14245 • Published Jan 20 • 9

upvoted 3 papers about 2 months ago

DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation

Paper • 2601.09688 • Published Jan 14 • 126

On the Role of Discreteness in Diffusion LLMs

Paper • 2512.22630 • Published Dec 27, 2025 • 18

mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 312

upvoted 4 papers 2 months ago

Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

Paper • 2512.08765 • Published Dec 9, 2025 • 133

EgoX: Egocentric Video Generation from a Single Exocentric Video

Paper • 2512.08269 • Published Dec 9, 2025 • 119

Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding

Paper • 2512.17532 • Published Dec 19, 2025 • 67

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

Paper • 2512.19693 • Published Dec 22, 2025 • 66

upvoted 3 papers 3 months ago

Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9, 2025 • 273

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Paper • 2511.14993 • Published Nov 19, 2025 • 231

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6, 2025 • 240

upvoted 3 collections 3 months ago

Multimodal Agent

139 items • Updated 22 days ago • 3

AI Paper of the Day

A collection of papers that I think are interesting, one added each day • 608 items • Updated about 2 hours ago • 83

LongVT-HF_Daily_Paper

1 item • Updated Dec 1, 2025 • 1

upvoted 2 papers 3 months ago

Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models

Paper • 2512.01949 • Published Dec 1, 2025 • 9

LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

Paper • 2511.20785 • Published Nov 25, 2025 • 185

upvoted a collection 3 months ago

LongVT

8 items • Updated Dec 11, 2025 • 9

upvoted a paper 3 months ago

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Paper • 2511.16334 • Published Nov 20, 2025 • 93