YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Qwen2.5-1.5B-Instruct Function Calling Model
Model Description
This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct optimized for function calling capabilities. It was trained using GRPO (Guided Reinforcement Policy Optimization) on the NousResearch/hermes-function-calling-v1 dataset, specifically the func_calling_singleturn
subset.
Intended Uses
This model is designed for:
- Small Agentic Setups where an agent needs to have low latency but good accuracy with medium level tasks
- Basic Chatbot that needs to scale horizontally with minimal vertical scaling
- Parsing user requests and identifying when to call specific functions
- Generating accurate function call schemas based on user inputs
- Supporting tool use in conversational AI applications
- Enabling structured data extraction from natural language
Training Details
- Base Model: Qwen/Qwen2.5-1.5B-Instruct
- Training Method: GRPO (Group Relative Policy Optimization)
- Training Framework: Unsloth
- Dataset: NousResearch/hermes-function-calling-v1 (func_calling_singleturn)
- Quantization: 4-bit quantization using bitsandbytes (bnb)
Performance and Limitations
Strengths
- Format Following ensures it doesn't break when generating multiple tool calls as GRPO was used mainly to enhance its format following capability rather than accuracy
- CoT enables understanding what's the current alignment of the model, further DPO on this GRPO model can enhance accuracy significantly
- It's a small model of 1.5B hence it can run on good CPU hardware with a decent speed
- Efficiently handles function calling with minimal computational resources
- Maintains the conversational capabilities of the base Qwen2.5-1.5B-Instruct model
- 4-bit quantization enables deployment on resource-constrained environments
Limitations
- Beyond 5000 input tokens model starts regressing, but this can be improved if DPO or ORPO is used for specific cases, so with this limitation basically if you want to scale horizontally then descriptions are to be kept brief
- Reasoning Traces of CoT can become very lenghty at times, model can take in a header for Instructions on CoT 1. to reduce reasoning traces length 2. enhance the accuracy by focusing on a certain format what needs to be put inside the CoT tags (Currently I'm relying alone on model's Cot Capability)
- Performance may vary compared to larger function calling models
- 1.5B parameter size inherently limits complexity of reasoning compared to larger models
- May struggle with highly complex or multi-step function calling scenarios
Usage
from unsloth import FastLanguageModel
import torch
from vllm import SamplingParams
max_seq_length = 4096 # Can increase for longer reasoning traces
lora_rank = 32 # Larger rank = smarter, but slower
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "Bharatdeep-H/xml_cot_fm_1",
#model_name = "unsloth/Qwen2.5-1.5B-Instruct",
max_seq_length = max_seq_length,
load_in_4bit = True, # False for LoRA 16bit
fast_inference = True, # Enable vLLM fast inference
max_lora_rank = lora_rank,
gpu_memory_utilization = 0.6, # Reduce if out of memory
)
model = FastLanguageModel.get_peft_model(
model,
r = lora_rank, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
target_modules = [
"q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",
], # Remove QKVO if out of memory
lora_alpha = lora_rank,
use_gradient_checkpointing = "unsloth", # Enable long context finetuning
random_state = 3407,
)
# Format definitions
FORMAT_PROMPT = """
Respond in the following format:
<chain_of_thought>
...
</chain_of_thought>
<tool_call>
...
</tool_call>
"""
SYSTEM_MIX_USER_PROMPT = "You are a function calling AI model. You are provided with function signatures within <tools> </tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.\n\n<tools>[[{'type': 'function', 'function': {'name': 'book_appointment', 'description': 'Books an appointment for a patient with a specific dentist at a given date and time.', 'parameters': {'type': 'object', 'properties': {'patient_id': {'type': 'string', 'description': 'The unique identifier for the patient.'}, 'dentist_id': {'type': 'string', 'description': 'The unique identifier for the dentist.'}, 'preferred_date': {'type': 'string', 'description': 'The preferred date for the appointment.'}, 'time_slot': {'type': 'string', 'description': 'The preferred time slot for the appointment.'}}, 'required': ['patient_id', 'dentist_id', 'preferred_date', 'time_slot']}}}, {'type': 'function', 'function': {'name': 'reschedule_appointment', 'description': 'Reschedules an existing appointment to a new date and time.', 'parameters': {'type': 'object', 'properties': {'appointment_id': {'type': 'string', 'description': 'The unique identifier for the existing appointment.'}, 'new_date': {'type': 'string', 'description': 'The new date for the rescheduled appointment.'}, 'new_time_slot': {'type': 'string', 'description': 'The new time slot for the rescheduled appointment.'}}, 'required': ['appointment_id', 'new_date', 'new_time_slot']}}}, {'type': 'function', 'function': {'name': 'cancel_appointment', 'description': 'Cancels an existing appointment.', 'parameters': {'type': 'object', 'properties': {'appointment_id': {'type': 'string', 'description': 'The unique identifier for the appointment to be canceled.'}}, 'required': ['appointment_id']}}}, {'type': 'function', 'function': {'name': 'find_available_time_slots', 'description': 'Finds available time slots for a dentist on a given date.', 'parameters': {'type': 'object', 'properties': {'dentist_id': {'type': 'string', 'description': 'The unique identifier for the dentist.'}, 'date': {'type': 'string', 'description': 'The date to check for available time slots.'}}, 'required': ['dentist_id', 'date']}}}, {'type': 'function', 'function': {'name': 'send_appointment_reminder', 'description': 'Sends an automated reminder to the patient for an upcoming appointment.', 'parameters': {'type': 'object', 'properties': {'appointment_id': {'type': 'string', 'description': 'The unique identifier for the appointment.'}, 'reminder_time': {'type': 'string', 'description': 'The time before the appointment when the reminder should be sent.'}}, 'required': ['appointment_id', 'reminder_time']}}}]]</tools>\n\nFor each user query, you must:\n\n1. First, generate your reasoning within <chain_of_thought> </chain_of_thought> tags. This should explain your analysis of the user's request and how you determined which function(s) to call, or why no appropriate function is available.\n\n2. Then, call the appropriate function(s) by returning a JSON object within <tool_call> </tool_call> tags using the following schema:\n<tool_call>\n{'arguments': <args-dict>, 'name': <function-name>}\n</tool_call>\n\n3. If you determine that none of the provided tools can appropriately resolve the user's query based on the tools' descriptions, you must still provide your reasoning in <chain_of_thought> tags, followed by:\n<tool_call>NO_CALL_AVAILABLE</tool_call>\n\nRemember that your <chain_of_thought> analysis must ALWAYS precede any <tool_call> tags, regardless of whether a suitable function is available."
USER_QUERY = "As the manager of a dental practice, I'm looking to streamline our booking process. I need to schedule an appointment for our patient, John Doe with ID 'p123', with Dr. Sarah Smith, whose dentist ID is 'd456'. Please book this appointment for May 15, 2023, at 2:00 PM. Additionally, I would like to set up an automated reminder for John Doe to ensure he remembers his appointment. Can you book this appointment and arrange for the reminder to be sent out in advance?"
text = tokenizer.apply_chat_template([
{'role': 'system', 'content': FORMAT_PROMPT},
{'role': 'user', 'content': SYSTEM_MIX_USER_PROMPT + "\n\nUSER QUERY: " + USER_QUERY}
], tokenize = False, add_generation_prompt = True)
sampling_params = SamplingParams(
temperature = 0.8,
top_p = 0.95,
max_tokens = 1024,
)
output = model.fast_generate(
text,
sampling_params = sampling_params,
#lora_request = model.load_lora("grpo_saved_lora"),
)[0].outputs[0].text
print(output)
Citation
If you use intend to use this model for testing, hit me up!
- Downloads last month
- 19
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
HF Inference deployability: The model has no library tag.