Bootstrapping World Models from Dynamics Models in Multimodal Foundation Models Paper • 2506.06006 • Published 5 days ago • 10
Inference-Time Hyper-Scaling with KV Cache Compression Paper • 2506.05345 • Published 5 days ago • 25
Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem Paper • 2506.03295 • Published 7 days ago • 17
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards Paper • 2505.24760 • Published 11 days ago • 61
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly Paper • 2505.10610 • Published 26 days ago • 53
Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs Paper • 2502.05092 • Published Feb 7 • 8
🔍 Interpretability & Analysis of LMs Collection Outstanding research in LM interpretability and evaluation, summarized • 116 items • Updated 5 days ago • 105
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering Paper • 2410.15999 • Published Oct 21, 2024 • 20
DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations Paper • 2410.18860 • Published Oct 24, 2024 • 11
FLARE: Faithful Logic-Aided Reasoning and Exploration Paper • 2410.11900 • Published Oct 14, 2024 • 4
A Simple and Effective L_2 Norm-Based Strategy for KV Cache Compression Paper • 2406.11430 • Published Jun 17, 2024 • 24
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models Paper • 2307.06440 • Published Jul 12, 2023 • 3
view article Article The Open Medical-LLM Leaderboard: Benchmarking Large Language Models in Healthcare By aaditya and 2 others • Apr 19, 2024 • 164
Open LLM Leaderboard best models ❤️🔥 Collection A daily uploaded list of models with best evaluations on the LLM leaderboard: • 65 items • Updated Mar 20 • 603