LGAI-EXAONE/EXAONE-4.0-32B-GGUF · Not supported Jinja template in Exaone 4.0 llama.cpp fork

Jul 15

I followed instructions on how to build your custom branch to run the gguf file, but I get this error for the jinja template, which results in the model not outputting relevant information when queried by the user:

Failed to generate tool call example: Unknown role: user at row 42, column 28:
{%- if role not in role_indicators %}
{{- raise_exception('Unknown role: ' ~ role) }}
^
{%- endif %}
at row 42, column 9:
{%- if role not in role_indicators %}
{{- raise_exception('Unknown role: ' ~ role) }}
^
{%- endif %}
at row 41, column 42:
{%- set role = msg.role %}
{%- if role not in role_indicators %}
^
{{- raise_exception('Unknown role: ' ~ role) }}
at row 41, column 5:
{%- set role = msg.role %}
{%- if role not in role_indicators %}
^
{{- raise_exception('Unknown role: ' ~ role) }}
at row 38, column 41:

{%- for i in range(messages | length) %}
^
{%- set msg = messages[i] %}
at row 38, column 1:

{%- for i in range(messages | length) %}
^
{%- set msg = messages[i] %}
at row 1, column 1:
{%- if not skip_think is defined %}
^
{%- set skip_think = true %}

srv load_model: load_model: Chat template parsing error: Unknown role: user at row 42, column 28:
{%- if role not in role_indicators %}
{{- raise_exception('Unknown role: ' ~ role) }}
^
{%- endif %}
at row 42, column 9:
{%- if role not in role_indicators %}
{{- raise_exception('Unknown role: ' ~ role) }}
^
{%- endif %}
at row 41, column 42:
{%- set role = msg.role %}
{%- if role not in role_indicators %}
^
{{- raise_exception('Unknown role: ' ~ role) }}
at row 41, column 5:
{%- set role = msg.role %}
{%- if role not in role_indicators %}
^
{{- raise_exception('Unknown role: ' ~ role) }}
at row 38, column 41:

{%- for i in range(messages | length) %}
^
{%- set msg = messages[i] %}
at row 38, column 1:

{%- for i in range(messages | length) %}
^
{%- set msg = messages[i] %}
at row 1, column 1:
{%- if not skip_think is defined %}
^
{%- set skip_think = true %}

srv load_model: load_model: The chat template that comes with this model is not yet supported, falling back to chatml. This may cause the model to output suboptimal responses

nuxlear

LG AI Research org Jul 15

Hello, @InfernalDread . Thank you for your attention.

We have met a same error when using EXAONE 4.0's default chat template, so please use a simplified version instead (which also can be found at our PR, please check the second code block on the link).

Here is the simplified version. Let me know if you encounter any other issues.

{%- set end_of_turn = '[|endofturn|]\n' %}

{%- macro available_tools(tools) %}
    {{- "# Available Tools" }}
    {{- "\nYou can use none, one, or multiple of the following tools by calling them as functions to help with the user’s query." }}
    {{- "\nHere are the tools available to you in JSON format within <tool> and </tool> tags:\n" }}
    {%- for tool in tools %}
        {{- "<tool>" }}
        {{- tool | tojson | safe }}
        {{- "</tool>\n" }}
    {%- endfor %}

    {{- "\nFor each function call you want to make, return a JSON object with function name and arguments within <tool_call> and </tool_call> tags, like:" }}
    {{- "\n<tool_call>{\"name\": function_1_name, \"arguments\": {argument_1_name: argument_1_value, argument_2_name: argument_2_value}}</tool_call>" }}
    {{- "\n<tool_call>{\"name\": function_2_name, \"arguments\": {...}}</tool_call>\n..." }}
    {{- "\nNote that if no argument name is specified for a tool, you can just print the argument value directly, without the argument name or JSON formatting." }}
{%- endmacro %}


{%- set ns = namespace(last_query_index = messages|length - 1) %}
{%- for message in messages %}
    {%- if message.role == "user" and message.content is string %}
        {%- set ns.last_query_index = loop.index0 -%}
    {%- endif %}
{%- endfor %}

{%- for i in range(messages | length) %}
    {%- set msg = messages[i] %}
    {%- set role = msg.role %}
    
    {%- if i == 0 %}
        {%- if role == 'system' %}
            {{- "[|system|]" }}
            {{- msg.content }}
            {%- if tools is defined and tools %}
                {{- "\n\n" }}{{- available_tools(tools) }}
            {%- endif %}
            {{- end_of_turn -}}
            {%- continue %}
        {%- elif tools is defined and tools %}            
            {{- "[|system|]" }}
            {{- available_tools(tools) }}
            {{- end_of_turn -}}            
        {%- endif %}
    {%- endif %}

    {%- if role == 'assistant' %}
        {{- "[|assistant|]" }}

        {%- if msg.content %}        
            {%- if "</think>" in msg.content %}
                {%- set content = msg.content.split('</think>')[-1].strip() %}
                {%- set reasoning_content = msg.content.split('</think>')[0].strip() %}
                {%- if reasoning_content.startswith("<think>") %}
                    {%- set reasoning_content = reasoning_content[9:].strip() %}
                {%- endif %}
            {%- else %}
                {%- set content = msg.content %}
            {%- endif %}

            {%- if msg.reasoning_content %}
                {%- set reasoning_content = msg.reasoning_content %}
            {%- endif %}

            {{- content }}
        {%- endif %}

        {%- if msg.tool_calls %}
            {%- if msg.content %}
                {{- "\n" }}
            {%- else %}
                {{- "<think>\n\n</think>\n\n" }}
            {%- endif %}
            {%- for tool_call in msg.tool_calls %}
                {%- if tool_call.function is defined %}
                    {%- set tool_call = tool_call.function %}
                {%- endif %}

                {%- if tool_call.arguments is defined %}
                    {%- set arguments = tool_call.arguments %}
                {%- elif tool_call.parameters is defined %}
                    {%- set arguments = tool_call.parameters %}
                {%- else %}
                    {{- raise_exception('arguments or parameters are mandatory: ' ~ tool_call) }}
                {%- endif %}

                {{- "<tool_call>" }}{"name": "{{- tool_call.name }}", "arguments": {{ arguments | tojson | safe }}}{{- "</tool_call>" }}

                {%- if not loop.last %}
                    {{- "\n" }}
                {%- endif %}

            {%- endfor %}
        {%- endif %}
        {{- end_of_turn -}}

    {%- elif role == "tool" %}
        {%- if i == 0 or messages[i - 1].role != "tool" %}
            {{- "[|tool|]" }}
        {%- endif %}
        {%- if msg.content is defined %}            
            {{- "<tool_result>" }}{"result": {{ msg.content | tojson | safe }}}{{- "</tool_result>" }}            
        {%- endif %}
        {%- if loop.last or messages[i + 1].role != "tool" %}
            {{- end_of_turn -}}
        {%- else %}
            {{- "\n" }}
        {%- endif %}

    {%- else %}
        {{- "[|user|]" }}
        {{- msg.content }}
        {{- end_of_turn -}}
    {%- endif %}
{% endfor %}


{%- if add_generation_prompt %}
    {{- "[|assistant|]" }}
    {%- if enable_thinking is defined and enable_thinking is true %}
        {{- "<think>\n" }}
    {%- else %}
        {{- "<think>\n\n</think>\n\n" }}
    {%- endif %}
{%- endif %}

InfernalDread

Jul 15

ok, no more errors when using this simplified template, but all that happens now on the server website, is it shows the three moving dots, utilizes my GPU, but then doesn't show the output response, its just a blank page

nuxlear

LG AI Research org Jul 15

Have you tested with a shorter max_tokens setting? We've observed that generation becomes noticeably slower when we don't explicitly set max_tokens in the requests to llama-server.

InfernalDread

Jul 15

•

edited Jul 15

what is the command line argument for that again? As for max context size (-c), I set mine to 30,000

EDIT: Realized it was the "-n" argument, I set it to 5000, but the same issue, the model finishes the response, but doesn't show it to me on the website. Also, there is no streaming of the tokens at all, is that normal?

Another EDIT: Reloaded the model, after waiting a really long time, it started streaming the tokens, I am not sure why it took long, I am using a RTX 4090, but at least it works now

Maybe last EDIT: Performance of the model seem to degrade A LOT while using this simplified jinja template, even just asking to create a butterfly with svg code, results in a mess, just wanted to let you guys know.

nuxlear

LG AI Research org Jul 15

•

edited Jul 15

Please use max_tokens parameters on your OpenAI chat completion requests (/v1/chat/completions).

You can find an example code of it on Quickstart.

owao

Jul 16

•

edited Jul 16

I can confirm what @InfernalDread observes. The content inside <think></think> isn't shown if using the llama.cpp server built in web UI. But it is, if using OpenWebUI for example.

nuxlear

LG AI Research org Jul 18

We've updated the chat template in the repository and confirmed that it works well with llama-server.

Please refer to our updated guide and relevant discussion.

nuxlear changed discussion status to closed Jul 18