arxiv:2502.08820

Can a Single Model Master Both Multi-turn Conversations and Tool Use? CALM: A Unified Conversational Agentic Language Model

Published on Feb 12

· Submitted by

emrecanacikgoz on Feb 18

Upvote

Authors:

Emre Can Acikgoz ,

Abstract

Large Language Models (LLMs) with API-calling capabilities enabled building effective Language Agents (LA), while also revolutionizing the conventional task-oriented dialogue (TOD) paradigm. However, current approaches face a critical dilemma: TOD systems are often trained on a limited set of target APIs, requiring new data to maintain their quality when interfacing with new services, while LAs are not trained to maintain user intent over multi-turn conversations. Because both robust multi-turn management and advanced function calling are crucial for effective conversational agents, we evaluate these skills on three popular benchmarks: MultiWOZ 2.4 (TOD), BFCL V3 (LA), and API-Bank (LA), and our analyses reveal that specialized approaches excel in one domain but underperform in the other. To bridge this chasm, we introduce CALM (Conversational Agentic Language Model), a unified approach that integrates both conversational and agentic capabilities. We created CALM-IT, a carefully constructed multi-task dataset that interleave multi-turn ReAct reasoning with complex API usage. Using CALM-IT, we train three models CALM 8B, CALM 70B, and CALM 405B, which outperform top domain-specific models, including GPT-4o, across all three benchmarks.

View arXiv page View PDF Add to collection

Community

emrecanacikgoz

Paper author Paper submitter 3 days ago

🚀 Can a Single Model Master Both Multi-turn Conversations and Tool Use?

Introducing CALM, fully open-source Conversational Agentic Language Models with CALM 8B, CALM 70B, and CALM 405B -excelling in both multi-turn dialogue management and function calling.

🦍CALM 405B is the largest open model in BFCL V3 Leaderboard, ranking #7, surpassing many proprietary models.
Leaderboard: https://lnkd.in/dxzassRC

Most models struggle with either long-term conversations and dialogue state tracking (TOD) or function-calling (LA). CALM (Conversational Agentic Language Model) bridges this gap! Trained on CALM-IT, our unified dataset blending multi-turn ReAct style TOD & complex API use, trained using the Oumi AI platform in partnership with Oumi and Together AI.
📊 Models: CALM 8B, CALM 70B, CALM-405B trained from Llama model series.

How does the CALM model family perform?
✅Outperforms GPT-4o and other top domain-specific models on:
📌MultiWOZ 2.4 (TOD)
📌BFCL V3 (Function Calling)
📌API-Bank (Function Calling)
Achieving top zero-shot scores not in one but across all benchmarks!