Stabilizing Transformer Training by Preventing Attention Entropy Collapse Paper • 2303.06296 • Published Mar 11, 2023
The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning Paper • 2307.10907 • Published Jul 20, 2023 • 8
Position Prediction as an Effective Pretraining Strategy Paper • 2207.07611 • Published Jul 15, 2022 • 1
DepthPro Models Collection Depth Pro: Sharp Monocular Metric Depth in Less Than a Second • 4 items • Updated 14 days ago • 7
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models Paper • 2501.12370 • Published about 1 month ago • 11
Attention is all you need for Videos: Self-attention based Video Summarization using Universal Transformers Paper • 1906.02792 • Published Jun 6, 2019
FastVLM: Efficient Vision Encoding for Vision Language Models Paper • 2412.13303 • Published Dec 17, 2024 • 13