Cross-Modal Safety Alignment: Is textual unlearning all you need? Paper • 2406.02575 • Published May 27, 2024 • 1
Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities Paper • 2502.05209 • Published Feb 3 • 1
MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning Paper • 2506.05523 • Published 6 days ago • 31
Textual Steering Vectors Can Improve Visual Understanding in Multimodal Large Language Models Paper • 2505.14071 • Published 23 days ago • 1
FoNE: Precise Single-Token Number Embeddings via Fourier Features Paper • 2502.09741 • Published Feb 13 • 14
Generalization Differences between End-to-End and Neuro-Symbolic Vision-Language Reasoning Systems Paper • 2210.15037 • Published Oct 26, 2022 • 1
TLDR: Token-Level Detective Reward Model for Large Vision Language Models Paper • 2410.04734 • Published Oct 7, 2024 • 17
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks Paper • 2410.10563 • Published Oct 14, 2024 • 39
TLDR: Token-Level Detective Reward Model for Large Vision Language Models Paper • 2410.04734 • Published Oct 7, 2024 • 17
From Pixels to Prose: A Large Dataset of Dense Image Captions Paper • 2406.10328 • Published Jun 14, 2024 • 18
Pre-trained Large Language Models Use Fourier Features to Compute Addition Paper • 2406.03445 • Published Jun 5, 2024
Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models Paper • 2310.17086 • Published Oct 26, 2023 • 1
IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations Paper • 2404.01266 • Published Apr 1, 2024 • 3