Limbic
Collection
A collection of models and datasets for Limbic -- captures and processes agent behavior, helps you understand it, and auto improves your agents
•
2 items
•
Updated
•
1
This model is a fine-tuned version of Qwen2.5-0.5B-Instruct specifically designed for evaluating function calls in the context of Model Context Protocol (MCP) tools. It can assess whether a function call is correct, uses the wrong tool, has incorrect parameter names, or has incorrect parameter values.
The prompt for the model takes two inputs:
available_tools
- a list of the tool schemasmessage_history
- the user request and model tool call response as a list of jsonsEVALUATOR_PROMPT = """\
# TOOL CALL EVALUATION RUBRIC
## EVALUATION CRITERIA
### 1. TOOL SELECTION
- [ ] Function name exists in available tools
- [ ] Function purpose matches user intent
### 2. PARAMETER STRUCTURE
- [ ] All required and relevant parameters are present
- [ ] No hallucinated parameter names
- [ ] Parameter names match tool schema exactly
### 3. PARAMETER VALUES
- [ ] Data types match expected types
- [ ] Values align with user request
- [ ] No fabricated or incorrect values
## CLASSIFICATION RULES
- All criteria passed → `correct`
- Failed criteria 1 → `incorrect_tool`
- Failed criteria 2 → `incorrect_parameter_names`
- Failed criteria 3 → `incorrect_parameter_values`
---
### AVAILABLE TOOLS
{available_tools}
---
### MESSAGE HISTORY
{message_history}
---
## OUTPUT REQUIREMENT
{{
"score": < correct | incorrect_tool | incorrect_parameter_names | incorrect_parameter_values >,
"reason": < [if incorrect, provide a brief list of reasons] >
}}
### EVALUATION:
"""
SYSTEM_PROMPT = "You are an expert evaluator of function calls. You will be given a function call and a list of available tools. You will need to evaluate the function call and return a score and a reason for the score."
available_tools = [
{
"name": "google-play-developer",
"description": "Get apps by a developer on Google Play",
"input_schema": {
"type": "object",
"properties": {
"devId": {"type": "string", "description": "Developer ID"},
"num": {"type": "number", "default": 60, "description": "Number of results"},
"lang": {"type": "string", "default": "en", "description": "Language code"},
"country": {"type": "string", "default": "us", "description": "Country code"}
},
"required": ["devId"]
}
}
]
message_history = [
{"role": "user", "content": "I'm looking to evaluate the performance of all the apps developed by 'Example Developer' on the Google Play Store. Could you provide me with a list of their recent applications, specifically in English and focused on the US market? Please limit the results to 50 apps for a quicker review."},
{"role": "assistant", "content": {"function": "name": "google-play-developer", "arguments": {"devId": "com.example.developer", "num": 50, "lang": "en", "country": "us"}}}
]
The model outputs evaluations in JSON format:
{
"score": "correct|incorrect_tool|incorrect_parameter_names|incorrect_parameter_values",
"reason": ["reasons for failure if incorrect"]
}
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("quotientai/limbic-tool-use-0.5B-32K")
model = AutoModelForCausalLM.from_pretrained("quotientai/limbic-tool-use-0.5B-32K")
To make a prediction, you must convert the formatted prompt into its chat format.
chat_template = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": "<your-formatted-user-prompt>"}
]
# Apply the chat template
text = tokenizer.apply_chat_template(chat_template, tokenize=False, add_generation_prompt=True)
# Tokenize with truncation
inputs = tokenizer(text, return_tensors="pt", truncation=True).to("cuda")
# Generate your prediction
result = model.generate(**inputs, max_new_tokens=128, use_cache=True)
@model{limbic-tool-use-0.5B-32K,
title={Limbic Tool Use Evaluator},
author={QuotientAI},
year={2025},
url={https://huggingface.co/quotientai/limbic-tool-use-0.5B-32K}
}