Don't throw away your value model! Making PPO even better via Value-Guided Monte-Carlo Tree Search decoding Paper • 2309.15028 • Published Sep 26, 2023 • 1
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts Paper • 2310.02255 • Published Oct 3, 2023 • 2
Crystal: Introspective Reasoners Reinforced with Self-Feedback Paper • 2310.04921 • Published Oct 7, 2023 • 1
NaturalProofs: Mathematical Theorem Proving in Natural Language Paper • 2104.01112 • Published Mar 24, 2021
Minds versus Machines: Rethinking Entailment Verification with Language Models Paper • 2402.03686 • Published Feb 6, 2024 • 1
NaturalProver: Grounded Mathematical Proof Generation with Language Models Paper • 2205.12910 • Published May 25, 2022
Rainier: Reinforced Knowledge Introspector for Commonsense Question Answering Paper • 2210.03078 • Published Oct 6, 2022 • 1
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback Paper • 2406.09279 • Published Jun 13, 2024 • 2
AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text Paper • 2410.04265 • Published Oct 5, 2024
Establishing Task Scaling Laws via Compute-Efficient Model Ladders Paper • 2412.04403 • Published Dec 5, 2024 • 2
Bridging the Data Provenance Gap Across Text, Speech and Video Paper • 2412.17847 • Published about 1 month ago • 8
Fin-Fact: A Benchmark Dataset for Multimodal Financial Fact Checking and Explanation Generation Paper • 2309.08793 • Published Sep 15, 2023
Evaluating Language Models as Synthetic Data Generators Paper • 2412.03679 • Published Dec 4, 2024 • 46
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action Paper • 2312.17172 • Published Dec 28, 2023 • 27
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25, 2024 • 106