---
tools:
- en
license: other
tags:
- mcp
- tools quality
- tool quality selection
- tool selection
- TSQ
- TUQ
- sequence-classification
- tool-evaluation
- function-call
- limbic
- tool use
- tool quality
pipeline_tag: text-classification
language:
- en
base_model:
- Qwen/Qwen3-0.6B
---

![](https://pixel.qualifire.ai/api/record/ranger)

## 🧠 Model Description

Designed for evaluating function calls in the context of **Model Context Protocol (MCP) tools**.  
It can assess whether a function call is correct, uses the wrong tool, has incorrect parameter names, or has incorrect parameter values.

The **mcp-tool-use-quality-ranger-0.6b** is a fine-tuned sequence classification model created to **evaluate the quality of function calls** in conversational AI systems.

**Max Context Length:** **32,768 Tokens**

<img src="Ranger_img.png" width="600px"/>

It determines if a given function call:

- Selects the correct tool
- Has correct parameter names and structure
- Contains correct parameter values

It produces one of four possible classification labels:

| Label | Meaning |
|-------|---------|
| **VALID_CALL** | ✅ The tool name, parameters, and values are all correct, or no suitable tool exists and no function call is made. |
| **TOOL_ERROR** | ❌ The tool name does not exist or does not match the user intent. |
| **PARAM_NAME_ERROR** | ❌ The correct tool is used, but parameter names are missing, extra, or incorrect. |
| **PARAM_VALUE_ERROR** | ❌ Tool and parameter names are correct, but parameter values are wrong or incorrectly formatted. |


---

## 🔽 Quantized Version  

- 🪶 **GGUF**: [qualifire/mcp-tool-use-quality-ranger-0.6b-GGUF](https://huggingface.co/qualifire/mcp-tool-use-quality-ranger-0.6b-GGUF)

---

## 📊 Benchmark Evaluation

The **mcp-tool-use-quality-ranger-0.6b** was evaluated in a binary classification setting,  
where the prediction is **Correct** if the function call evaluation matched the gold label, and **Incorrect** otherwise.

| Model                                         | #Params | Avg. Latency | Avg Binary Accuracy | [Qualifire mcp-tool-use-quality Benchmark](https://huggingface.co/datasets/qualifire/mcp-tool-use-quality-benchmark) Binary Accuracy | [Limbic Benchmark](https://huggingface.co/datasets/quotientai/limbic-eval-tool-use-mcp) Binary Accuracy |
|-----------------------------------------------|---------|--------------------|---------------------|--------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|
| qualifire/mcp-tool-use-quality-ranger-4b [private] | 4B      | 0.30[sec]              | 0.962               | 0.971                                                                                                  | 0.954                                                                                  |
| **qualifire/mcp-tool-use-quality-ranger-0.6b**     | **0.6B**| **0.09[sec]**           | **0.928**           | **0.949**                                                                                              | **0.907**                                                                                |
| gemini-2.5-flash                              | -       | 4.87[sec]               | 0.858               | 0.871                                                                                                  | 0.845                                                                                   |
| quotientai/limbic-tool-use-0.5B-32K           | 0.5B    | 0.79[sec]              | 0.798               | 0.708                                                                                                  | 0.887                                                                                   |

### 📌 Metrics Definitions
- **Avg. Binary Accuracy** – Mean accuracy across all evaluated benchmarks,  
  where predictions are mapped to binary outcomes as follows:
  
  -  **Qualifire TUQ Benchmark**
      - **Correct** → `VALID_CALL`  
      - **Incorrect** → `TOOL_ERROR`, `PARAM_NAME_ERROR` or `PARAM_VALUE_ERROR`
  
  - **Limbic Benchmark**
    - **Correct** → `correct`  
    - **Incorrect** → `incorrect_tool`, `incorrect_parameter_names` or `incorrect_parameter_values`
      
- **Qualifire TUQ Benchmark** link – [Qualifire Tool Selection Quality Benchmark](https://huggingface.co/datasets/qualifire/mcp-tool-selection-quality-benchmark).  
- **Limbic Benchmark** link – [Limbic Eval Tool Use MCP Benchmark](https://huggingface.co/datasets/quotientai/limbic-eval-tool-use-mcp).

---

## 📜 Evaluation Prompt Template

The model uses the following structured evaluation process:

1. **TOOL SELECTION**
   - Check if the tool name exists in `available_tools`
   - Check if tool purpose matches user intent
   - Fail → `TOOL_ERROR`❌

2. **PARAMETER STRUCTURE**
   - All required parameters are present
   - No extra parameters
   - Parameter names exactly match the schema
   - Fail → `PARAM_NAME_ERROR`❌

3. **PARAMETER VALUES**
   - Values have correct data types
   - Values match user request
   - No fabricated or incorrect values
   - Fail → `PARAM_VALUE_ERROR`❌

If all checks pass → `VALID_CALL`✅

---
### 📦 Requirements
- `transformers>=4.51.0`
- `huggingface_hub`
- `torch`

---

## 💻 Usage

```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
import torch
from huggingface_hub import hf_hub_download

# Model name
model_name = "qualifire/mcp-tool-use-quality-ranger-0.6b"

# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map='auto',
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Create pipeline
pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)

# Load prompt template
file_path = hf_hub_download(repo_id=model_name, filename="tsq_prompt_template.txt")
with open(file_path, encoding="utf-8") as f:
    PROMPT_TEMPLATE = f.read()

# Example inputs
example_tools_list = '''[
  {
    "name": "order_food",
    "description": "Order food from a restaurant.\nArgs:\nrestaurant_url: URL of the restaurant\nitem_name: Name of the item to order",
    "inputSchema": {
      "type": "object",
      "title": "order_foodArguments",
      "required": ["item_url", "item_name"],
      "properties": {
        "item_url": {
          "type": "string",
          "title": "Item Url"
        },
        "item_name": {
          "type": "string",
          "title": "Item Name"
        }
      }
    }
  }
'''


example_message_history = '''[
  {
    "role": "user",
    "content": "Could you please order 2 Margherita pizzas for delivery to 123 Main Street, Anytown?"
  },
  {
    "completion_message": {
      "content": {
        "type": "text",
        "text": ""
      },
      "role": "assistant",
      "stop_reason": "tool_calls",
      "tool_calls": [
        {
          "id": "call_p8yj1p",
          "function": {
            "name": "order_food",
            "arguments": {
              "item": "Margherita Pizza",
              "quantity": 3, 
              "delivery_address": "123 Main Street, Anytown"
            }
          }
        }
      ]
    }
  }
]'''

# Format input
example_input = PROMPT_TEMPLATE.format(
    message_history=example_message_history,
    available_tools=example_tools_list
)

# Get prediction
result = pipe(example_input)[0]
print(result)
```

## ✨ Example Output

```
{'label': 'PARAM_VALUE_ERROR', 'score': 0.8680815696716309}
```

The value for quantity should be 2, not 3. Therefore, the correct label is: PARAM_VALUE_ERROR.