LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding Paper • 2410.17434 • Published Oct 22, 2024 • 25
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution Paper • 2307.06304 • Published Jul 12, 2023 • 29
VampNet: Music Generation via Masked Acoustic Token Modeling Paper • 2307.04686 • Published Jul 10, 2023 • 20
High-Fidelity Audio Compression with Improved RVQGAN Paper • 2306.06546 • Published Jun 11, 2023 • 10