FastVLM: Efficient Vision Encoding for Vision Language Models Paper • 2412.13303 • Published Dec 17, 2024 • 21
Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization Paper • 2409.12903 • Published Sep 19, 2024 • 23
DataComp-LM: In search of the next generation of training sets for language models Paper • 2406.11794 • Published Jun 17, 2024 • 54
CLIP with Quality Captions: A Strong Pretraining for Vision Tasks Paper • 2405.08911 • Published May 14, 2024 • 1
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding Paper • 2310.15308 • Published Oct 23, 2023 • 23
Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement Paper • 2303.08983 • Published Mar 15, 2023
APE: Aligning Pretrained Encoders to Quickly Learn Aligned Multimodal Representations Paper • 2210.03927 • Published Oct 8, 2022
CLIP meets Model Zoo Experts: Pseudo-Supervision for Visual Enhancement Paper • 2310.14108 • Published Oct 21, 2023 • 1
Weight subcloning: direct initialization of transformers using larger pretrained ones Paper • 2312.09299 • Published Dec 14, 2023 • 19
Technical Report on the CleverHans v2.1.0 Adversarial Examples Library Paper • 1610.00768 • Published Oct 3, 2016
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training Paper • 2311.17049 • Published Nov 28, 2023 • 2
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data Paper • 2404.15653 • Published Apr 24, 2024 • 30