Just high-bpw quantization of functionary for a drop-in OpenAI function calling replacement. See the llama-cpp-python docs:

GGUF

Model size

6.74B params

Architecture

llama

Hardware compatibility

6-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support