Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability Paper • 2412.18551 • Published Dec 24, 2024
Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning Paper • 2502.11751 • Published Feb 17
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models Paper • 2504.11468 • Published Apr 10 • 29
ViLBench: A Suite for Vision-Language Process Reward Modeling Paper • 2503.20271 • Published Mar 26 • 7
AdaVAE: Exploring Adaptive GPT-2s in Variational Auto-Encoders for Language Modeling Paper • 2205.05862 • Published May 12, 2022
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? Paper • 2409.15277 • Published Sep 23, 2024 • 39
VHELM: A Holistic Evaluation of Vision Language Models Paper • 2410.07112 • Published Oct 9, 2024 • 3
Code-Survey: An LLM-Driven Methodology for Analyzing Large-Scale Codebases Paper • 2410.01837 • Published Sep 24, 2024
Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning Paper • 2312.11420 • Published Dec 18, 2023 • 2
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence Paper • 2404.05892 • Published Apr 8, 2024 • 39
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation? Paper • 2407.04842 • Published Jul 5, 2024 • 57
What If We Recaption Billions of Web Images with LLaMA-3? Paper • 2406.08478 • Published Jun 12, 2024 • 42
How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs Paper • 2311.16101 • Published Nov 27, 2023 • 1
ReSee: Responding through Seeing Fine-grained Visual Knowledge in Open-domain Dialogue Paper • 2305.13602 • Published May 23, 2023 • 1