kaipeng's picture

kaipeng

kpzhang996

·

AI & ML interests

None yet

Organizations

upvoted 2 papers 2 months ago

Yume: An Interactive World Generation Model

Paper • 2507.17744 • Published Jul 23 • 85

π^3: Scalable Permutation-Equivariant Visual Geometry Learning

Paper • 2507.13347 • Published Jul 17 • 64

upvoted 3 papers 3 months ago

Neural-Driven Image Editing

Paper • 2507.05397 • Published Jul 7 • 26

Sekai: A Video Dataset towards World Exploration

Paper • 2506.15675 • Published Jun 18 • 64

A High-Quality Dataset and Reliable Evaluation for Interleaved Image-Text Generation

Paper • 2506.09427 • Published Jun 11 • 8

upvoted 2 papers 4 months ago

SridBench: Benchmark of Scientific Research Illustration Drawing of Image Generation Model

Paper • 2505.22126 • Published May 28 • 3

IA-T2I: Internet-Augmented Text-to-Image Generation

Paper • 2505.15779 • Published May 21 • 14

upvoted a paper 5 months ago

MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models

Paper • 2504.05782 • Published Apr 8 • 3

upvoted 7 papers 6 months ago

CLS-RL: Image Classification with Rule-Based Reinforcement Learning

Paper • 2503.16188 • Published Mar 20 • 11

Improving Autoregressive Image Generation through Coarse-to-Fine Token Prediction

Paper • 2503.16194 • Published Mar 20 • 8

MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification

Paper • 2503.12505 • Published Mar 16 • 11

PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models

Paper • 2503.12545 • Published Mar 16 • 6

Neighboring Autoregressive Modeling for Efficient Visual Generation

Paper • 2503.10696 • Published Mar 12 • 8

ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges

Paper • 2503.06553 • Published Mar 9 • 7

ARMOR v0.1: Empowering Autoregressive Multimodal Understanding Model with Interleaved Multimodal Generation via Asymmetric Synergy

Paper • 2503.06542 • Published Mar 9 • 7

upvoted 2 papers 10 months ago

ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality

Paper • 2412.04062 • Published Dec 5, 2024 • 9

GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

Paper • 2411.18499 • Published Nov 27, 2024 • 18

upvoted 2 papers 11 months ago

CLEAR: Character Unlearning in Textual and Visual Modalities

Paper • 2410.18057 • Published Oct 23, 2024 • 209

TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts

Paper • 2410.18071 • Published Oct 23, 2024 • 7

upvoted a paper 12 months ago

ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification and KV Cache Compression

Paper • 2410.08584 • Published Oct 11, 2024 • 12