Papers
arxiv:2510.06217

TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning

Published on Oct 7
ยท Submitted by Jiaru Zou on Oct 8
ยท amazon Amazon
Authors:
,
,
,
Pan Lu ,
,
,

Abstract

TaTToo, a novel table-grounded Process Reward Model, enhances tabular reasoning by explicitly addressing table-specific operations and integrating tool-based verification, leading to significant performance improvements over existing PRMs.

AI-generated summary

Process Reward Models (PRMs) have recently emerged as a powerful framework for enhancing the reasoning capabilities of large reasoning models (LRMs), particularly in the context of test-time scaling (TTS). However, their potential for supervising LRMs on tabular reasoning domains remains underexplored. Through detailed empirical analyses, we identify that existing PRMs, though widely adopted for supervising text-only reasoning steps, struggle with table-specific operations such as sub-table retrieval and schema interaction, leading to critical performance bottlenecks. To address this limitation, we propose TaTToo, a novel table-grounded PRM framework that (i) reasons explicitly over tabular reasoning steps and (ii) integrates tool-based verification to provide precise reward supervision. Concretely, we first design a scalable data curation pipeline that constructs over 60k high-quality step-level annotations by integrating table verification rationales with tool-based executions. Building on the collected data, we train TaTToo with a dual-stage paradigm: cold-start supervised fine-tuning to capture tool-use reasoning patterns, followed by reinforcement learning with tool-grounded reward shaping to align our model with table-based verification. We provide a comprehensive evaluation of the policy improvement induced by our newly designed PRM. Across 5 challenging tabular reasoning benchmarks covering numerical reasoning, fact-checking, and data analysis, TaTToo improves downstream policy LRMs by 30.9% at inference, surpasses strong PRM baselines such as Qwen-2.5-Math-PRM-72B with only 8B parameters, and demonstrates strong generalizability across diverse TTS strategies.

Community

Paper submitter
โ€ข
edited 4 days ago

๐Ÿš€ TATTOO โ€“ a novel tool-grounded process reward model (PRM) for tabular reasoning

๐Ÿ”Ž What we do:

  • Conduct detailed pilot analyses to diagnose the performance bottlenecks of prior PRMs on table-specific step-supervision (retrieval & schema interaction).

  • Build a 60k+ instance dataset with tool-augmented, step-level verification.

  • Train with a dual-stage agentic training paradigm for better leverage of tools during the per-step evaluation process.

  • Scale effectively under various TTS strategies, including Best-of-N, Beam Search, and DVTS.

๐Ÿ’ก Why it matters:
TATTOO shows that tool-integrated thinking PRMs can supervise both thinking and table operations, offering more reliable reward signals for better reasoning and verification.

๐Ÿ‘‰ Paper: https://arxiv.org/pdf/2510.06217

Paper submitter

b24d5009322cc35d2bac3518218409ba

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.06217 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.06217 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.06217 in a Space README.md to link it from this page.

Collections including this paper 3