Text Generation
GGUF
English
quant
experimental
conversational

[BUG] Model does not support tools in Ollama

#1
by liemkg1234 - opened

Error

{
    "error": {
        "message": "hf.co/eaddario/Hammer2.1-7b-GGUF:Q4_K_M does not support tools",
        "type": "api_error",
        "param": null,
        "code": null
    }
}

curl

curl --location 'http://localhost:1111/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer empty' \
--data '{
    "model": "hf.co/eaddario/Hammer2.1-7b-GGUF:Q4_K_M",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant that can use tools."
      },
      {
        "role": "user",
        "content": "What'\''s the weather like in San Francisco?"
      }
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get the current weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA"
              },
              "unit": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"],
                "description": "The unit of temperature to return"
              }
            },
            "required": ["location"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

Docker

  llm:
    image: ollama/ollama:latest
    container_name: llm
    ports:
      - "1111:8000"
    volumes:
      - ./llm/ollama/llm.entrypoint.sh:/root/entrypoint.sh
      - ./volumes/llm/models/llm:/root/.ollama
    environment:
      - OLLAMA_HOST=http://0.0.0.0:8000
      - OLLAMA_KEEP_ALIVE=-1
      - OLLAMA_CONTEXT_LENGTH=8192
    entrypoint: ["/bin/sh", "/root/entrypoint.sh"]
    runtime: nvidia
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    pull_policy: always
    tty: true
    restart: always
    profiles:
      - llm
    networks:
      - llm-lab

entrypoint

#!/bin/bash

echo "Starting Ollama server..."
ollama serve &
sleep 5

ollama run hf.co/eaddario/Hammer2.1-7b-GGUF:Q4_K_M # Qwen 2.5 trained in function calling
ollama list

tail -f /dev/null

Haven't used ollama in a long time but will check it out although I wouldn't be surprised if it's related to the jinja chat template

Sign up or log in to comment