Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search Paper • 2503.04412 • Published Mar 6 • 5
A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning Paper • 2507.08267 • Published Jul 11 • 10
Heron-Bench: A Benchmark for Evaluating Vision Language Models in Japanese Paper • 2404.07824 • Published Apr 11, 2024 • 3
NuScenes-MQA: Integrated Evaluation of Captions and QA for Autonomous Driving Datasets using Markup Annotations Paper • 2312.06352 • Published Dec 11, 2023 • 1
Evaluation of Large Language Models for Decision Making in Autonomous Driving Paper • 2312.06351 • Published Dec 11, 2023 • 6