MLLMs

university

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

jiuhai updated a dataset about 1 month ago

umd-vt-nyu/code

jiuhai published a dataset about 1 month ago

umd-vt-nyu/code

jiuhai published a model about 1 month ago

umd-vt-nyu/soda

View all activity

jiuhai

updated a dataset about 1 month ago

umd-vt-nyu/code

Updated Jun 22 • 32

jiuhai

published a dataset about 1 month ago

umd-vt-nyu/code

Updated Jun 22 • 32

jiuhai

published a model about 1 month ago

umd-vt-nyu/soda

Updated Jun 22

zhiyang1

authored a paper about 1 month ago

LaTtE-Flow: Layerwise Timestep-Expert Flow-based Transformer

Paper • 2506.06952 • Published Jun 8 • 10

jiuhai

updated 3 models 2 months ago

umd-vt-nyu/JH_dc-vae-f32c32-sana-1.0-768_patch-1_epoch-64_group-7_fusion_residual_attn

Updated May 24

umd-vt-nyu/flow_siglip2_512_sana_512_1e4_64token_2ndlast_sstk_16

Updated May 22

umd-vt-nyu/JH_dc-vae-f32c32-sana-1.0-768_patch-1_baseline_fusion

Updated May 21

xcpan

authored a paper 2 months ago

Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis

Paper • 2505.10046 • Published May 15 • 9

jiuhai

authored a paper 2 months ago

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14 • 95

xcpan

authored 3 papers 2 months ago

PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop

Paper • 2503.09595 • Published Mar 12

Transfer between Modalities with MetaQueries

Paper • 2504.06256 • Published Apr 8 • 1

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14 • 95

jiuhai

authored a paper 3 months ago

ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness

Paper • 2504.10514 • Published Apr 10 • 47

jiuhai

authored a paper 8 months ago

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

Paper • 2412.04424 • Published Dec 5, 2024 • 64

xcpan

authored a paper about 1 year ago

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Paper • 2406.16860 • Published Jun 24, 2024 • 61

jiuhai

authored 2 papers over 1 year ago

Automated Data Curation for Robust Language Model Fine-Tuning

Paper • 2403.12776 • Published Mar 19, 2024

ODIN: Disentangled Reward Mitigates Hacking in RLHF

Paper • 2402.07319 • Published Feb 11, 2024 • 14

xcpan

authored 3 papers over 1 year ago

Image Sculpting: Precise Object Editing with 3D Geometry Control

Paper • 2401.01702 • Published Jan 2, 2024 • 21

Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition

Paper • 2203.07996 • Published Feb 24, 2022

Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models

Paper • 2211.10950 • Published Nov 20, 2022

AI & ML interests

Recent Activity

Team members 3

umd-vt-nyu's activity