CALM Before the STORM: Unlocking Native Reasoning for Optimization Modeling
Abstract
CALM framework uses expert interventions to refine LRM reasoning for optimization tasks, achieving high accuracy with fewer modifications compared to traditional methods.
Large Reasoning Models (LRMs) have demonstrated strong capabilities in complex multi-step reasoning, opening new opportunities for automating optimization modeling. However, existing domain adaptation methods, originally designed for earlier instruction-tuned models, often fail to exploit the advanced reasoning patterns of modern LRMs -- In particular, we show that direct fine-tuning on traditional non-reflective datasets leads to limited gains. To fully leverage LRMs' inherent reasoning abilities, we propose CALM (Corrective Adaptation with Lightweight Modification), a framework that progressively refines LRMs within their native reasoning modes for optimization modeling tasks. In CALM, an expert intervener identifies reasoning flaws and provides concise corrective hints, which the LRM incorporates to produce improved reasoning trajectories. These interventions modify fewer than 2.6\% of generated tokens, but generate high-quality data for soft adaptation through supervised fine-tuning. The adapted model is then further improved through reinforcement learning. Building on CALM, we develop STORM (Smart Thinking Optimization Reasoning Model), a 4B-parameter LRM that achieves a new state-of-the-art average accuracy of 68.9\% across five popular optimization modeling benchmarks, matching the performance of a 671B LRM. These results demonstrate that dynamic, hint-based data synthesis both preserves and amplifies the native reasoning patterns of modern LRMs, offering a more effective and scalable path towards expert-level performance on challenging optimization modeling tasks.
Community
We introduce STORM, a 4B parameter model that achieves new SOTA on 5 optimization modeling benchmarks (68.9% avg acc), matching the performance of a 671B model.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- StepORLM: A Self-Evolving Framework With Generative Process Supervision For Operations Research Language Models (2025)
- THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning (2025)
- OptiMind: Teaching LLMs to Think Like Optimization Experts (2025)
- Your Models Have Thought Enough: Training Large Reasoning Models to Stop Overthinking (2025)
- Tagging the Thought: Unlocking Personalization Reasoning via Reinforcement Learning (2025)
- Datarus-R1: An Adaptive Multi-Step Reasoning LLM for Automated Data Analysis (2025)
- Socratic-Zero : Bootstrapping Reasoning via Data-Free Agent Co-evolution (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper