Qwen2.5-Coder-32B-Glaive-ToolCall

Model Description

This model is a fine-tuned version of Qwen/Qwen2.5-Coder-32B-Instruct specifically enhanced for tool calling capabilities. The model has been trained using the Glaive Function Calling v2 dataset (glaiveai/glaive-function-calling-v2) to significantly improve its ability to understand, generate, and execute function calls in various programming and automation contexts.

Model Details

Base Model: Qwen/Qwen2.5-Coder-32B-Instruct
Model Type: Large Language Model (LLM) with enhanced tool calling capabilities
Architecture: Transformer-based decoder model
Parameters: 32 billion parameters
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Training Dataset: glaive-function-calling-v2
Language Support: Multilingual

Training Configuration

Fine-tuning Type: LoRA with rank 8, alpha 16
Training Epochs: 3.0
Learning Rate: 5e-5 with cosine scheduler
Batch Size: 2 per device with 8 gradient accumulation steps
Context Length: 2048 tokens
Optimizer: AdamW
Precision: BF16
Max Samples: 100,000

Enhanced Capabilities

Tool Calling Improvements

This model demonstrates significant improvements in:

Function Schema Understanding: Enhanced ability to parse and understand complex function signatures and parameter requirements
Context-Aware Tool Selection: Improved decision-making for selecting appropriate tools based on user queries
Parameter Extraction: Better extraction and formatting of function parameters from natural language inputs
Multi-step Tool Orchestration: Enhanced capability to chain multiple tool calls for complex tasks
Error Handling: Improved error detection and recovery in tool calling scenarios

Key Features

Robust JSON Generation: Produces well-formatted JSON for function calls with proper schema adherence
Natural Language Integration: Seamlessly integrates tool calls within conversational responses
Code Generation with Tools: Enhanced ability to generate code that incorporates external tool usage
API Integration: Improved understanding of REST APIs, GraphQL, and other web service interfaces

Use Cases

This model is particularly well-suited for:

AI Assistants: Building conversational AI that can interact with external systems
Automation Workflows: Creating intelligent automation scripts with dynamic tool usage
Code Generation: Generating code that integrates with APIs and external services
Data Processing: Automating data analysis and processing tasks with appropriate tools
System Integration: Building bridges between different software systems and services

Usage Example

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load the model and tokenizer
model_name = "RekklesAI/Qwen2.5-Coder-32B-Glaive-ToolCall"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

# Example prompt for tool calling
prompt = """You have access to a weather API. Help me get the current weather for New York City.

Available tools:
- get_weather(location: str, units: str = "metric") -> dict

User: What's the weather like in New York City?"""

# Generate response
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
    outputs = model.generate(
        inputs.input_ids,
        max_new_tokens=512,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Performance Metrics

The model shows significant improvements in tool calling benchmarks:

Function Call Accuracy: Enhanced precision in generating syntactically correct function calls
Parameter Extraction: Improved accuracy in extracting relevant parameters from user queries
Tool Selection: Better performance in selecting appropriate tools for given tasks
JSON Formatting: Reduced errors in JSON structure and formatting

Training Loss

The following chart shows the training loss progression during the fine-tuning process:

Training loss curve demonstrating stable convergence over 3 epochs with the Glaive Function Calling v2 dataset.

Limitations

The model's tool calling capabilities are primarily trained on the patterns present in the Glaive Function Calling v2 dataset
Performance may vary for highly specialized or domain-specific tools not represented in the training data
Like all LLMs, the model may occasionally generate plausible-sounding but incorrect tool calls
The model requires careful prompt engineering for optimal tool calling performance

Ethical Considerations

Tool Safety: Users should implement proper validation and sandboxing when allowing the model to execute actual tool calls
Access Control: Implement appropriate access controls and permissions for tools accessible to the model
Data Privacy: Be mindful of sensitive data that might be passed through tool calls
Monitoring: Implement logging and monitoring for tool usage in production environments

Training Data

The model was fine-tuned using the Glaive Function Calling v2 dataset (glaiveai/glaive-function-calling-v2), a comprehensive and high-quality dataset specifically designed for training language models in function calling capabilities.

Dataset Overview

Dataset Size: 113,000 training examples
Format: JSON with structured conversations
Language: English
License: Apache 2.0
Source: Glaive AI

Dataset Characteristics

The Glaive Function Calling v2 dataset is meticulously curated to provide diverse and realistic function calling scenarios:

Conversation Structure

System Messages: Define the assistant's role and available functions with detailed schemas
Multi-turn Dialogues: Natural conversations between users and AI assistants
Function Calls: Properly formatted JSON function invocations
Function Responses: Realistic API responses and result handling
Error Scenarios: Examples of graceful error handling and capability limitations

Function Diversity

The dataset covers a wide range of function types and use cases:

Utility Functions: Email sending, calendar management, password generation
Data Retrieval: News headlines, stock prices, weather information
Computational Tasks: Mathematical calculations, unit conversions, data analysis
Search Operations: Movie searches, book lookups, general information retrieval
Communication Tools: Contact management, messaging systems
Financial Services: Exchange rates, loan calculations, investment data
Content Creation: Text generation, formatting, summarization

Quality Features

Realistic Scenarios: Conversations mirror real-world user interactions with AI assistants
Proper Error Handling: Examples of polite refusals when functions are unavailable
Parameter Validation: Correct handling of required and optional function parameters
Context Awareness: Functions are called appropriately based on conversation context
Natural Language Integration: Seamless integration of function results into conversational responses

Training Examples Include:

Single Function Calls: Simple, direct function invocations
Multi-step Workflows: Complex scenarios requiring multiple function calls
Parameter Extraction: Converting natural language requests into structured function parameters
Response Formatting: Presenting function results in user-friendly formats
Capability Boundaries: Clear communication of system limitations

Dataset Impact on Model Performance

This carefully curated dataset enables the model to:

Understand Function Schemas: Parse and comprehend complex function definitions
Extract Parameters: Accurately identify and format required function arguments from user queries
Generate Valid JSON: Produce syntactically correct function calls
Handle Edge Cases: Manage scenarios where requested functions are unavailable
Maintain Conversational Flow: Integrate function calling seamlessly into natural dialogue
Provide Helpful Responses: Transform function results into meaningful user communications

Technical Implementation

The dataset follows industry-standard formats for function calling:

OpenAI-compatible function schemas
Structured JSON for function definitions and calls
Clear separation between system instructions, user queries, and function responses
Consistent formatting across all examples

This comprehensive training data ensures the model can handle real-world function calling scenarios with high accuracy and reliability, making it suitable for production deployment in AI assistant applications, automation workflows, and API integration tasks.

Technical Specifications

Framework: Built using LLaMA-Factory
Hardware Requirements: Recommended 80GB+ VRAM for inference
Quantization: Compatible with various quantization methods (GPTQ, AWQ, etc.)
Deployment: Suitable for both cloud and on-premise deployment

Citation

If you use this model in your research or applications, please cite:

@misc{qwen25-coder-glaive-toolcall,
  title={Qwen2.5-Coder-32B-Glaive-ToolCall},
  author={[RekklesAI]},
  year={2025},
  note={Fine-tuned version of Qwen2.5-Coder-32B-Instruct with enhanced tool calling capabilities using Glaive dataset}
}

License

apache-2.0

Acknowledgments

Qwen Team: For the excellent base model Qwen2.5-Coder-32B-Instruct
Glaive: For providing the high-quality tool calling dataset
LLaMA-Factory: For the efficient fine-tuning framework

This model card follows the guidelines for responsible AI model documentation and transparency.

RekklesAI
/

Qwen2.5-Coder-32B-Glaive-ToolCall