An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published Jun 11 • 55 • 20
VoCo-LLaMA: Towards Vision Compression with Large Language Models Paper • 2406.12275 • Published Jun 18 • 29 • 10