Remasking Discrete Diffusion Models with Inference-Time Scaling Paper • 2503.00307 • Published Mar 1 • 10
HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs Paper • 2503.02003 • Published Mar 3 • 48
ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models Paper • 2502.09696 • Published Feb 13 • 44
GRS-QA -- Graph Reasoning-Structured Question Answering Dataset Paper • 2411.00369 • Published Nov 1, 2024 • 7
Recursive Introspection: Teaching Language Model Agents How to Self-Improve Paper • 2407.18219 • Published Jul 25, 2024 • 3
VideoGameBunny: Towards vision assistants for video games Paper • 2407.15295 • Published Jul 21, 2024 • 22
Steering Llama 2 via Contrastive Activation Addition Paper • 2312.06681 • Published Dec 9, 2023 • 15