arxiv:2507.12901

Agentar-DeepFinance-100K: A Large-Scale Financial Dataset via Systematic Chain-of-Thought Synthesis Optimization

Published on Jul 17

Authors:

Abstract

A large-scale financial reasoning dataset with optimized chain-of-thought synthesis improves model performance on financial benchmarks.

AI-generated summary

Recent advancements in large language models (LLMs) have demonstrated remarkable general reasoning capabilities, holding significant potential for applications in the financial domain, a field that requires robust and reliable reasoning. It has been demonstrated that distilling high-quality chain-of-thought (CoT) rationales from advanced general reasoning models offers a promising and efficient path to the financial reasoning model. However, existing CoT synthesis methods suffer from shallow CoT sampling, leaving the question of how to construct a well-designed knowledge space for finance reasoning unexplored. In this paper, we present Agentar-DeepFinance-100K, a large-scale financial reasoning dataset characterized by its systematic CoT synthesis optimization. We first introduce a comprehensive CoT synthesis pipeline featuring Multi-perspective Knowledge Extraction (MKE) and Self-Corrective Rewriting (SCR) to generate exhaustive and deep financial reasoning trajectories. Furthermore, a systematic investigation, termed CoT Cube, is conducted to analyze critical factors that influence CoT effectiveness, such as necessity, length and synthesizer, yielding valuable insights for high-quality financial CoT construction. Experiments demonstrate that models trained on our Agentar-DeepFinance-100K achieve significant improvements on financial benchmarks. We publicly release Agentar-DeepFinance-100K , hoping to advance the research in financial reasoning models.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2507.12901 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2507.12901 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.12901 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.