PUG: Photorealistic and Semantically Controllable Synthetic Data for Representation Learning Paper • 2308.03977 • Published Aug 8, 2023
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding Paper • 2410.17434 • Published Oct 22, 2024 • 30
Improving the Scaling Laws of Synthetic Data with Deliberate Practice Paper • 2502.15588 • Published Feb 21
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions Paper • 2312.08578 • Published Dec 14, 2023 • 20
Predicting masked tokens in stochastic locations improves masked image modeling Paper • 2308.00566 • Published Jul 31, 2023 • 16