Recommended Sampling Parameters

by kth8 - opened Apr 17

Discussion

kth8

Apr 17

Hi, I'm wondering if there are recommended sampling settings to get the best output such as :

TEMPERATURE
TOP_K
TOP_P
MIN_P
REPETITION_PENALTY

casual-upvork

Apr 17

There's this "official" preset for general use on Replicate:

https://replicate.com/ibm-granite/granite-3.3-8b-instruct

kth8

Apr 17

•

edited Apr 17

OK I'l see if these settings are any better although they not much different than the default in llama.cpp

kth8

Apr 17

been trying these settings but hard to tell any noticeable difference.

TEMPERATURE = 0.6
TOP_K = 50
TOP_P = 0.9
MIN_P = 0.01
REPETITION_PENALTY = 1.0

also I can't get it to think with the thinking control

curl -s http://127.0.0.1:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
  "messages": [
    {"role": "control", "content": "thinking"},
    {"role": "user", "content": "why is the sky blue?"}
  ]
}' | jq -r .choices[].message.content

unlike on https://www.ibm.com/granite/playground/

gabegoodhart

IBM Granite org Apr 17

Hi @kth8 ! Thanks for experimenting with those sampling parameters. It's great to get feedback on what works (or doesn't).

As for the thinking role, this one is a bit tricky. The model itself uses a chat template that expects to run with apply_chat_template on the client side and as such doesn't support {"role": "control", "content": "thinking"} out-of-the-box. We've developed extended templates for services that serve the model behind an Open AI compatible API that are unable to take thinking as a keyword argument (along with documents, and controls). These templates are exposed in watsonx.ai and Ollama currently. Based on your curl sample, I'm guessing you're not running this against Ollama, so you're probably using the out-of-the-box template.

We're still working on the correct public home for these extended templates, but in the meantime, I'll add them below:

jinja2 extended template

{#

------ MESSAGE PARSING ------

#}
{%- set system = namespace(value="") %}
{%- set allDocuments = namespace(rendered="", count=0)%}
{%- set thinkingVar = namespace(enabled=thinking or false) %}
{%- set controls = controls or {} %}
{%- set citationsVar = namespace(enabled='citations' in controls or false) %}
{%- set hallucinationsVar = namespace(enabled='hallucinations' in controls or false) -%}
{%- set lengthVar = namespace(value=controls.length if 'length' in controls else '') -%}
{%- set originalityVar = namespace(value=controls.originality if 'originality' in controls else '') -%}

{# Alias tools -> available_tools #}
{%- if tools and not available_tools -%}
    {%- set available_tools = tools -%}
{%- endif -%}

{# Expand kwarg-provided documents #}
{%- for document in (documents or []) -%}
    {%- if allDocuments.count > 0 %}
        {%- set allDocuments.rendered = allDocuments.rendered + '\n' %}
    {%- endif %}
    {%- set allDocuments.rendered = allDocuments.rendered + '<|start_of_role|>document {"document_id": "' %}
    {%- if 'doc_id' in document %}
        {%- set allDocuments.rendered = allDocuments.rendered + document['doc_id'] %}
    {%- elif 'title' in document %}
        {%- set allDocuments.rendered = allDocuments.rendered + document['title'] %}
    {%- else %}
        {%- set allDocuments.rendered = allDocuments.rendered + allDocuments.count | string %}
    {%- endif %}
    {%- set allDocuments.rendered = allDocuments.rendered + '"}<|end_of_role>\n' + document['text'] + '<|end_of_text|>\n' %}
    {%- set allDocuments.count = allDocuments.count + 1 %}
{%- endfor -%}

{# Look through all messages and handle special roles #}
{%- for message in messages -%}
    {# User defined system prompt #}
    {%- if message['role'] == 'system' %}
        {%- set system.value = message['content'] %}
    {%- endif -%}

    {# Role specified controls #}
    {%- if message['role'].startswith('control')%}
        {%- if message['content'] == 'thinking'%}
            {%- set thinkingVar.enabled = true %}
        {%- endif %}
        {%- if message['content'] == 'citations'%}
            {%- set citationsVar.enabled = true %}
        {%- endif %}
        {%- if message['content'] == 'hallucinations'%}
            {%- set hallucinationsVar.enabled = true %}
        {%- endif %}
        {%- if ( message['content'].startswith('length ') )%}
            {%- set lengthVar.value = message['content'][7:] %}
        {%- endif %}
        {%- if ( message['content'].startswith('originality ') )%}
            {%- set originalityVar.value = message['content'][12:] %}
        {%- endif %}
    {%- endif -%}

    {# Role specified document #}
    {%- if (message['role'].startswith('document')) %}
        {%- if allDocuments.count > 0 %}
            {%- set allDocuments.rendered = allDocuments.rendered + '\n' %}
        {%- endif %}
        {%- set allDocuments.rendered = allDocuments.rendered + '<|start_of_role|>document {"document_id": "' %}
        {%- if 'doc_id' in message %}
            {%- set allDocuments.rendered = allDocuments.rendered + message['doc_id'] %}
        {%- elif 'title' in message %}
            {%- set allDocuments.rendered = allDocuments.rendered + message['title'] %}
        {%- else %}
            {%- set title = message['role'][8:].strip() %}
            {%- if not title %}
                {%- set title = allDocuments.count | string %}
            {%- endif %}
            {%- set allDocuments.rendered = allDocuments.rendered + title %}
        {%- endif %}
        {%- set allDocuments.rendered = allDocuments.rendered + '"}<|end_of_role|>\n' + message['content'] + '<|end_of_text|>\n' %}
        {%- set allDocuments.count = allDocuments.count + 1 %}
    {%- endif %}
{%- endfor -%}

{# Build default system prompt if not set #}
{%- if not system.value %}
    {%- set system.value = "Knowledge Cutoff Date: April 2024.
Today's Date: " + strftime_now('%B %d, %Y') + ". You are Granite, developed by IBM." %}
    {%- if available_tools and allDocuments.rendered %}
        {%- set system.value = system.value + " You are a helpful assistant with access to the following tools. When a tool is required to answer the user's query, respond only with <|tool_call|> followed by a JSON list of tools used. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request.
Write the response to the user's input by strictly aligning with the facts in the provided documents. If the information needed to answer the question is not available in the documents, inform the user that the question cannot be answered based on the available data." %}
    {%- elif tools %}
        {%- set system.value = system.value + " You are a helpful assistant with access to the following tools. When a tool is required to answer the user's query, respond only with <|tool_call|> followed by a JSON list of tools used. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request." %}
    {%- elif allDocuments.rendered %}
        {%- set system.value = system.value + " Write the response to the user's input by strictly aligning with the facts in the provided documents. If the information needed to answer the question is not available in the documents, inform the user that the question cannot be answered based on the available data." %}
    {%- elif thinkingVar.enabled %}
        {%- set system.value = system.value + " You are a helpful AI assistant.
Respond to every user query in a comprehensive and detailed way. You can write down your thoughts and reasoning process before responding. In the thought process, engage in a comprehensive cycle of analysis, summarization, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. In the response section, based on various attempts, explorations, and reflections from the thoughts section, systematically present the final solution that you deem correct. The response should summarize the thought process. Write your thoughts between <think></think> and write your response between <response></response> for each user query." %}
    {%- else %}
        {%- set system.value = system.value + " You are a helpful AI assistant." %}
    {%- endif %}
    {%- if allDocuments.rendered and citationsVar.enabled %}
        {%- set system.value = system.value + '
Use the symbols <|start_of_cite|> and <|end_of_cite|> to indicate when a fact comes from a document in the search result, e.g <|start_of_cite|> {document_id: 1}my fact <|end_of_cite|> for a fact from document 1. Afterwards, list all the citations with their corresponding documents in an ordered list.' %}
    {%- endif %}
    {%- if allDocuments.rendered and hallucinationsVar.enabled %}
        {%- set system.value = system.value + '
Finally, after the response is written, include a numbered list of sentences from the response with a corresponding risk value that are hallucinated and not based in the documents.' %}
    {%- endif %}
{%- endif -%}
{#

------ TEMPLATE EXPANSION ------

#}
{{- '<|start_of_role|>system<|end_of_role|>' + system.value + '<|end_of_text|>
' }}
{%- if available_tools %}
    {{- '<|start_of_role|>available_tools<|end_of_role|>' }}
    {{- available_tools | tojson(indent=4) }}
    {{- '<|end_of_text|>
' }}
{%- endif %}
{%- if allDocuments.rendered %}
    {{- allDocuments.rendered }}
{%- endif %}
{%- for message in messages %}
    {%- if (
        message['role'] not in ['system', 'document', 'control'] and
        not message['role'].startswith('document') and
        not message['role'].startswith('control')
    ) %}
        {{- '<|start_of_role|>' + message['role'] + '<|end_of_role|>' + message['content'] + '<|end_of_text|>
' }}
    {%- endif %}
    {%- if loop.last and add_generation_prompt -%}
        {{- '<|start_of_role|>assistant' }}
            {%- if lengthVar.value and originalityVar.value %}
                {{- ' ' + {'length': lengthVar.value, 'originality': originalityVar.value} | tojson() }}
            {%- elif lengthVar.value %}
                {{- ' ' + {'length': lengthVar.value} | tojson() }}
            {%- elif originalityVar.value %}
                {{- ' ' + {'originality': originalityVar.value} | tojson() }}
            {%- endif %}
        {{- '<|end_of_role|>' }}
    {%- endif %}
{%- endfor %}

Ollama extended template

{{- /*

------ MESSAGE PARSING ------

*/}}
{{- /*
Declare the prompt structure variables to be filled in from messages
*/}}
{{- $system := "" }}
{{- $documents := "" }}
{{- $documentCounter := 0 }}
{{- $thinking := false }}
{{- $citations := false }}
{{- $hallucinations := false }}
{{- $length := "" }}
{{- $originality := "" }}

{{- /*
Loop over messages and look for a user-provided system message and documents
*/ -}}
{{- range .Messages }}

    {{- /* User defined system prompt(s) */}}
    {{- if (eq .Role "system")}}
        {{- if (ne $system "") }}
            {{- $system = print $system "\n\n" }}
        {{- end}}
        {{- $system = print $system .Content }}
    {{- end}}

    {{- /*
    NOTE: Since Ollama collates consecutive roles, for control and documents, we
        work around this by allowing the role to contain a qualifier after the
        role string.
    */ -}}

    {{- /* Role specified controls */ -}}
    {{- if (and (ge (len .Role) 7) (eq (slice .Role 0 7) "control")) }}
        {{- if (eq .Content "thinking")}}{{- $thinking = true }}{{- end}}
        {{- if (eq .Content "citations")}}{{- $citations = true }}{{- end}}
        {{- if (eq .Content "hallucinations")}}{{- $hallucinations = true }}{{- end}}
        {{- if (and (ge (len .Content) 7) (eq (slice .Content 0 7) "length "))}}
            {{- $length = slice .Content 7 }}
        {{- end}}
        {{- if (and (ge (len .Content) 12) (eq (slice .Content 0 12) "originality "))}}
            {{- $originality = slice .Content 12 }}
        {{- end}}
    {{- end}}

    {{- /* Role specified document */ -}}
    {{- if (and (ge (len .Role) 8) (eq (slice .Role 0 8) "document")) }}
        {{- if (ne $documentCounter 0)}}
            {{- $documents = print $documents "\n\n"}}
        {{- end}}
        {{- $identifier := ""}}
        {{- if (ge (len .Role) 9) }}
            {{- $identifier = (slice .Role 9)}}
        {{- end}}
        {{- if (eq $identifier "") }}
            {{- $identifier := print $documentCounter}}
        {{- end}}
        {{- $documents = print $documents "<|start_of_role|>document {\"document_id\": \"" $identifier "\"}<|end_of_role|>\n" .Content "<|end_of_text|>"}}
        {{- $documentCounter = len (printf "a%*s" $documentCounter "")}}
    {{- end}}
{{- end}}

{{- /*
If no user message provided, build the default system message
*/ -}}
{{- if eq $system "" }}
    {{- $system = "Knowledge Cutoff Date: April 2024.\nYou are Granite, developed by IBM."}}

    {{- /* Add Tools prompt */}}
    {{- if .Tools }}
        {{- $system = print $system " You are a helpful assistant with access to the following tools. When a tool is required to answer the user's query, respond only with <|tool_call|> followed by a JSON list of tools used. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request." }}
    {{- end}}

    {{- /* Add documents prompt */}}
    {{- if $documents }}
        {{- if .Tools }}
            {{- $system = print $system "\n"}}
        {{- else }}
            {{- $system = print $system " "}}
        {{- end}}
        {{- $system = print $system "Write the response to the user's input by strictly aligning with the facts in the provided documents. If the information needed to answer the question is not available in the documents, inform the user that the question cannot be answered based on the available data." }}
        {{- if $citations}}
            {{- $system = print $system "\nUse the symbols <|start_of_cite|> and <|end_of_cite|> to indicate when a fact comes from a document in the search result, e.g <|start_of_cite|> {document_id: 1}my fact <|end_of_cite|> for a fact from document 1. Afterwards, list all the citations with their corresponding documents in an ordered list."}}
        {{- end}}
        {{- if $hallucinations}}
            {{- $system = print $system "\nFinally, after the response is written, include a numbered list of sentences from the response with a corresponding risk value that are hallucinated and not based in the documents."}}
        {{- end}}
    {{- end}}

    {{- /* Prompt without tools or documents */}}
    {{- if (and (not .Tools) (not $documents)) }}
        {{- $system = print $system " You are a helpful AI assistant."}}
        {{- if $thinking}}
            {{- $system = print $system "\nRespond to every user query in a comprehensive and detailed way. You can write down your thoughts and reasoning process before responding. In the thought process, engage in a comprehensive cycle of analysis, summarization, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. In the response section, based on various attempts, explorations, and reflections from the thoughts section, systematically present the final solution that you deem correct. The response should summarize the thought process. Write your thoughts between <think></think> and write your response between <response></response> for each user query."}}
        {{- end}}
    {{- end}}

{{- end}}
{{- /*

------ TEMPLATE EXPANSION ------

*/}}
{{- /* System Prompt */ -}}
<|start_of_role|>system<|end_of_role|>{{- $system }}<|end_of_text|>

{{- /* Tools */ -}}
{{- if .Tools }}
<|start_of_role|>available_tools<|end_of_role|>[
{{- range $index, $_ := .Tools }}
{{ . }}
{{- if and (ne (len (slice $.Tools $index)) 1) (gt (len $.Tools) 1) }},
{{- end}}
{{- end }}
]<|end_of_text|>
{{- end}}

{{- /* Documents */ -}}
{{- if $documents }}
{{ $documents }}
{{- end}}

{{- /* Standard Messages */}}
{{- range $index, $_ := .Messages }}
{{- if (and
    (ne .Role "system")
    (or (lt (len .Role) 7) (ne (slice .Role 0 7) "control"))
    (or (lt (len .Role) 8) (ne (slice .Role 0 8) "document"))
)}}
<|start_of_role|>
{{- if eq .Role "tool" }}tool_response
{{- else }}{{ .Role }}
{{- end }}<|end_of_role|>
{{- if .Content }}{{ .Content }}
{{- else if .ToolCalls }}<|tool_call|>
{{- range .ToolCalls }}{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
{{- end }}
{{- end }}
{{- if eq (len (slice $.Messages $index)) 1 }}
{{- if eq .Role "assistant" }}
{{- else }}<|end_of_text|>
<|start_of_role|>assistant
{{- if and (ne $length "") (ne $originality "") }} {"length": "{{ $length }}", "originality": "{{ $originality }}"}
{{- else if ne $length "" }} {"length": "{{ $length }}"}
{{- else if ne $originality "" }} {"originality": "{{ $originality }}"}
{{- end }}<|end_of_role|>
{{- end -}}
{{- else }}<|end_of_text|>
{{- end }}
{{- end }}
{{- end }}

kth8

Apr 18

•

edited Apr 18

@gabegoodhart I am using the latest build of llama.cpp. I tried using the extended template you provided but it wouldn't load with the error:

Failed to generate tool call example: Unknown method: startswith at row 45, column 28:

I asked my AI about it which responded with

the error Unknown method: startswith indicates that the Jinja2 environment provided by llama-server (or the specific version of the Jinja2 library it uses) doesn't recognize the .startswith() string method directly within the template expressions.

I then asked the AI to fix the template which allowed llama.cpp to load it but running the curl command above didn't enable thinking and broke other stuff. I tried using the thinking control with Ollama which worked without problem.

curl -s http://localhost:11434/api/chat -d '{
  "model": "granite3.3:2b",
  "messages": [
    {"role": "control","content": "thinking"},
    {"role": "user","content": "why is the sky blue?"}
    ],
  "stream": false
}' | jq -r .message.content

kth8

Apr 18

•

edited Apr 18

Looking at the extended template, we can just take the system prompt from the control role and use it directly to enable thinking? For example like:

#!/usr/bin/env python3
from llama_cpp import Llama

repo = "ibm-granite/granite-3.3-2b-instruct-GGUF"
system_prompt = """
You are a helpful AI assistant.
Respond to every user query in a comprehensive and detailed way. You can write down your thoughts and reasoning process before responding. In the thought process, engage in a comprehensive cycle of analysis, summarization, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. In the response section, based on various attempts, explorations, and reflections from the thoughts section, systematically present the final solution that you deem correct. The response should summarize the thought process. Write your thoughts between <think></think> and write your response between <response></response> for each user query.
""".strip()

def load_model(repo_id):
    return Llama.from_pretrained(
        repo_id=repo_id,
        filename="*Q4_K_M.gguf",
        local_dir=".",
        n_ctx=4096,
        verbose=False,
        n_gpu_layers=-1
    )

def generate_response(model):
   return model.create_chat_completion(
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": "Why is the sky blue?"}
        ],
        temperature=0.6,
        top_k=50,
        top_p=0.9,
        min_p=0.01,
        repeat_penalty=1.0,
        stream=True
    )

def process_response(response):
    for chunk in response:
        content = chunk['choices'][0]['delta'].get('content', '')
        if content:
            print(content, end='', flush=True)

def main():
    model = load_model(repo)
    response = generate_response(model)
    process_response(response)

if __name__ == "__main__":
    main()

kth8

Apr 19

•

edited Apr 19

I tried using the MLX version but it also didn't enable thinking.

from mlx_lm import load, stream_generate
from mlx_lm.sample_utils import make_sampler
model, tokenizer = load("mlx-community/granite-3.3-2b-instruct-4bit")

if hasattr(tokenizer, "apply_chat_template") and tokenizer.chat_template is not None:
    messages = [
        {"role": "control", "content": "thinking"},
        {"role": "user", "content": "Why is the sky blue?"}
        ]
    sampler = make_sampler(temp=0.6, top_k=50, top_p=0.90, min_p=0.01)
    prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
    stream = stream_generate(model=model, tokenizer=tokenizer, prompt=prompt, sampler=sampler)
    
for chunk in stream:
    if chunk:
        print(chunk.text, end="", flush=True)

gabegoodhart

IBM Granite org Apr 21

I asked my AI about it which responded with

the error Unknown method: startswith indicates that the Jinja2 environment provided by llama-server (or the specific version of the Jinja2 library it uses)

That's a smart AI! Yes, the extended template is intended for use with the python jinja2 implementation and uses some python-specific functionality like startswith. I don't have a detailed comparison with the jinja2 engine used in llama.cpp, but I'm not at all surprised that it wouldn't support all of this.

Looking at the extended template, we can just take the system prompt from the control role and use it directly to enable thinking?

Yep, this is probably the easiest way to do it with llama.cpp's jinja2 engine at this point.

I tried using the MLX version but it also didn't enable thinking.

Yep, this is likely the same issue where the ability to pass the control role is only implemented in the template and the MLX version doesn't support it. I think the workaround you found of manually constructing the system prompt on the client side is the most robust workaround.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment