Qwen2.5-1.5B-Instruct Function Calling Model

Model Description

This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct optimized for function calling capabilities. It was trained using GRPO (Guided Reinforcement Policy Optimization) on the NousResearch/hermes-function-calling-v1 dataset, specifically the func_calling_singleturn subset.

Intended Uses

This model is designed for:

Small Agentic Setups where an agent needs to have low latency but good accuracy with medium level tasks
Basic Chatbot that needs to scale horizontally with minimal vertical scaling
Parsing user requests and identifying when to call specific functions
Generating accurate function call schemas based on user inputs
Supporting tool use in conversational AI applications
Enabling structured data extraction from natural language

Training Details

Base Model: Qwen/Qwen2.5-1.5B-Instruct
Training Method: GRPO (Group Relative Policy Optimization)
Training Framework: Unsloth
Dataset: NousResearch/hermes-function-calling-v1 (func_calling_singleturn)
Quantization: 4-bit quantization using bitsandbytes (bnb)

Performance and Limitations

Strengths

Format Following ensures it doesn't break when generating multiple tool calls as GRPO was used mainly to enhance its format following capability rather than accuracy
CoT enables understanding what's the current alignment of the model, further DPO on this GRPO model can enhance accuracy significantly
It's a small model of 1.5B hence it can run on good CPU hardware with a decent speed
Efficiently handles function calling with minimal computational resources
Maintains the conversational capabilities of the base Qwen2.5-1.5B-Instruct model
4-bit quantization enables deployment on resource-constrained environments

Limitations

Beyond 5000 input tokens model starts regressing, but this can be improved if DPO or ORPO is used for specific cases, so with this limitation basically if you want to scale horizontally then descriptions are to be kept brief
Reasoning Traces of CoT can become very lenghty at times, model can take in a header for Instructions on CoT 1. to reduce reasoning traces length 2. enhance the accuracy by focusing on a certain format what needs to be put inside the CoT tags (Currently I'm relying alone on model's Cot Capability)
Performance may vary compared to larger function calling models
1.5B parameter size inherently limits complexity of reasoning compared to larger models
May struggle with highly complex or multi-step function calling scenarios

Usage

from unsloth import FastLanguageModel
import torch
from vllm import SamplingParams

max_seq_length = 4096  # Can increase for longer reasoning traces
lora_rank = 32 # Larger rank = smarter, but slower

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "Bharatdeep-H/xml_cot_fm_1",
    #model_name = "unsloth/Qwen2.5-1.5B-Instruct",
    max_seq_length = max_seq_length,
    load_in_4bit = True, # False for LoRA 16bit
    fast_inference = True, # Enable vLLM fast inference
    max_lora_rank = lora_rank,
    gpu_memory_utilization = 0.6, # Reduce if out of memory
)

model = FastLanguageModel.get_peft_model(
    model,
    r = lora_rank, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = [
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
    ], # Remove QKVO if out of memory
    lora_alpha = lora_rank,
    use_gradient_checkpointing = "unsloth", # Enable long context finetuning
    random_state = 3407,
)

# Format definitions
FORMAT_PROMPT = """
Respond in the following format:
<chain_of_thought>
...
</chain_of_thought>
<tool_call>
...
</tool_call>
"""

SYSTEM_MIX_USER_PROMPT = "You are a function calling AI model. You are provided with function signatures within <tools> </tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.\n\n<tools>[[{'type': 'function', 'function': {'name': 'book_appointment', 'description': 'Books an appointment for a patient with a specific dentist at a given date and time.', 'parameters': {'type': 'object', 'properties': {'patient_id': {'type': 'string', 'description': 'The unique identifier for the patient.'}, 'dentist_id': {'type': 'string', 'description': 'The unique identifier for the dentist.'}, 'preferred_date': {'type': 'string', 'description': 'The preferred date for the appointment.'}, 'time_slot': {'type': 'string', 'description': 'The preferred time slot for the appointment.'}}, 'required': ['patient_id', 'dentist_id', 'preferred_date', 'time_slot']}}}, {'type': 'function', 'function': {'name': 'reschedule_appointment', 'description': 'Reschedules an existing appointment to a new date and time.', 'parameters': {'type': 'object', 'properties': {'appointment_id': {'type': 'string', 'description': 'The unique identifier for the existing appointment.'}, 'new_date': {'type': 'string', 'description': 'The new date for the rescheduled appointment.'}, 'new_time_slot': {'type': 'string', 'description': 'The new time slot for the rescheduled appointment.'}}, 'required': ['appointment_id', 'new_date', 'new_time_slot']}}}, {'type': 'function', 'function': {'name': 'cancel_appointment', 'description': 'Cancels an existing appointment.', 'parameters': {'type': 'object', 'properties': {'appointment_id': {'type': 'string', 'description': 'The unique identifier for the appointment to be canceled.'}}, 'required': ['appointment_id']}}}, {'type': 'function', 'function': {'name': 'find_available_time_slots', 'description': 'Finds available time slots for a dentist on a given date.', 'parameters': {'type': 'object', 'properties': {'dentist_id': {'type': 'string', 'description': 'The unique identifier for the dentist.'}, 'date': {'type': 'string', 'description': 'The date to check for available time slots.'}}, 'required': ['dentist_id', 'date']}}}, {'type': 'function', 'function': {'name': 'send_appointment_reminder', 'description': 'Sends an automated reminder to the patient for an upcoming appointment.', 'parameters': {'type': 'object', 'properties': {'appointment_id': {'type': 'string', 'description': 'The unique identifier for the appointment.'}, 'reminder_time': {'type': 'string', 'description': 'The time before the appointment when the reminder should be sent.'}}, 'required': ['appointment_id', 'reminder_time']}}}]]</tools>\n\nFor each user query, you must:\n\n1. First, generate your reasoning within <chain_of_thought> </chain_of_thought> tags. This should explain your analysis of the user's request and how you determined which function(s) to call, or why no appropriate function is available.\n\n2. Then, call the appropriate function(s) by returning a JSON object within <tool_call> </tool_call> tags using the following schema:\n<tool_call>\n{'arguments': <args-dict>, 'name': <function-name>}\n</tool_call>\n\n3. If you determine that none of the provided tools can appropriately resolve the user's query based on the tools' descriptions, you must still provide your reasoning in <chain_of_thought> tags, followed by:\n<tool_call>NO_CALL_AVAILABLE</tool_call>\n\nRemember that your <chain_of_thought> analysis must ALWAYS precede any <tool_call> tags, regardless of whether a suitable function is available."
USER_QUERY = "As the manager of a dental practice, I'm looking to streamline our booking process. I need to schedule an appointment for our patient, John Doe with ID 'p123', with Dr. Sarah Smith, whose dentist ID is 'd456'. Please book this appointment for May 15, 2023, at 2:00 PM. Additionally, I would like to set up an automated reminder for John Doe to ensure he remembers his appointment. Can you book this appointment and arrange for the reminder to be sent out in advance?"

text = tokenizer.apply_chat_template([
    {'role': 'system', 'content': FORMAT_PROMPT},
    {'role': 'user', 'content':  SYSTEM_MIX_USER_PROMPT + "\n\nUSER QUERY: " + USER_QUERY}
], tokenize = False, add_generation_prompt = True)

sampling_params = SamplingParams(
    temperature = 0.8,
    top_p = 0.95,
    max_tokens = 1024,
)


output = model.fast_generate(
    text,
    sampling_params = sampling_params,
    #lora_request = model.load_lora("grpo_saved_lora"),
)[0].outputs[0].text

print(output)

Citation

If you use intend to use this model for testing, hit me up!