arxiv:2510.06217

TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning

Published on Oct 7

· Submitted by

Jiaru Zou on Oct 8

Amazon

Upvote

Authors:

Soumya Roy ,

Vinay Kumar Verma ,

Pan Lu ,

Abstract

TaTToo, a novel table-grounded Process Reward Model, enhances tabular reasoning by explicitly addressing table-specific operations and integrating tool-based verification, leading to significant performance improvements over existing PRMs.

AI-generated summary

Process Reward Models (PRMs) have recently emerged as a powerful framework for enhancing the reasoning capabilities of large reasoning models (LRMs), particularly in the context of test-time scaling (TTS). However, their potential for supervising LRMs on tabular reasoning domains remains underexplored. Through detailed empirical analyses, we identify that existing PRMs, though widely adopted for supervising text-only reasoning steps, struggle with table-specific operations such as sub-table retrieval and schema interaction, leading to critical performance bottlenecks. To address this limitation, we propose TaTToo, a novel table-grounded PRM framework that (i) reasons explicitly over tabular reasoning steps and (ii) integrates tool-based verification to provide precise reward supervision. Concretely, we first design a scalable data curation pipeline that constructs over 60k high-quality step-level annotations by integrating table verification rationales with tool-based executions. Building on the collected data, we train TaTToo with a dual-stage paradigm: cold-start supervised fine-tuning to capture tool-use reasoning patterns, followed by reinforcement learning with tool-grounded reward shaping to align our model with table-based verification. We provide a comprehensive evaluation of the policy improvement induced by our newly designed PRM. Across 5 challenging tabular reasoning benchmarks covering numerical reasoning, fact-checking, and data analysis, TaTToo improves downstream policy LRMs by 30.9% at inference, surpasses strong PRM baselines such as Qwen-2.5-Math-PRM-72B with only 8B parameters, and demonstrates strong generalizability across diverse TTS strategies.

View arXiv page View PDF Add to collection

Community

jiaruz2

Paper submitter 4 days ago

•

edited 4 days ago

🚀 TATTOO – a novel tool-grounded process reward model (PRM) for tabular reasoning

🔎 What we do:

Conduct detailed pilot analyses to diagnose the performance bottlenecks of prior PRMs on table-specific step-supervision (retrieval & schema interaction).
Build a 60k+ instance dataset with tool-augmented, step-level verification.
Train with a dual-stage agentic training paradigm for better leverage of tools during the per-step evaluation process.
Scale effectively under various TTS strategies, including Best-of-N, Beam Search, and DVTS.

💡 Why it matters:
TATTOO shows that tool-integrated thinking PRMs can supervise both thinking and table operations, offering more reliable reward signals for better reasoning and verification.

👉 Paper: https://arxiv.org/pdf/2510.06217