Shengqiong Wu's picture

7 9

Shengqiong Wu

ChocoWu

·

https://chocowu.github.io/

ChocoWu

AI & ML interests

Large Language Model, Multimodal learning, Scene graph Generation

Recent Activity

updated a dataset 1 day ago

General-Level/General-Bench-Openset

updated a dataset 2 days ago

General-Level/General-Bench-Closeset-Scoped

updated a dataset 3 days ago

General-Level/General-Bench-Closeset

View all activity

Organizations

authored a paper 2 months ago

On Path to Multimodal Generalist: General-Level and General-Bench

Paper • 2505.04620 • Published May 7 • 83

authored 4 papers 3 months ago

VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models

Paper • 2504.13122 • Published Apr 17 • 21

LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation

Paper • 2308.05095 • Published Aug 9, 2023

JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization

Paper • 2503.23377 • Published Mar 30 • 58

Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation

Paper • 2503.24379 • Published Mar 31 • 77

authored a paper 4 months ago

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

Paper • 2503.12605 • Published Mar 16 • 36

authored a paper 6 months ago

Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

Paper • 2412.19806 • Published Oct 8, 2024 • 2

authored a paper about 1 year ago

OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Paper • 2406.19389 • Published Jun 27, 2024 • 55

authored a paper almost 2 years ago

NExT-GPT: Any-to-Any Multimodal LLM

Paper • 2309.05519 • Published Sep 11, 2023 • 78