HuggingChat: Input validation error: `inputs` tokens + `max_new_tokens` must be..

#430
by Kostyak - opened

I use the meta-llama/Meta-Llama-3-70B-Instruct model. After a certain number of moves, the AI refuses to walk and gives an error : "Input validation error: inputs tokens + max_new_tokens must be <= 8192. Given: 6391 inputs tokens and 2047 max_new_tokens". Is this a bug or some new limitation? I still don't get it to be honest and I hope I get an answer here. I'm new to this site.

Kostyak changed discussion status to closed
Kostyak changed discussion title from Input validation error: `inputs` tokens + `max_new_tokens` must be.. to HuggingChat: Input validation error: `inputs` tokens + `max_new_tokens` must be..
Kostyak changed discussion status to open
Kostyak changed discussion status to closed
Kostyak changed discussion status to open

Same issue all of the sudden today

Can you see if this still happens? Should be fixed now.

This comment has been hidden

Can you see if this still happens? Should be fixed now.

Still same error, except numbers have changed a little.
Screenshot_20.png

I keep getting this error as well. Using CohereForAI

Same error, "Meta-Llama-3-70B-Instruct" model.

I have also been running into this error. Is there a workaround or solution at all?

"Input validation error: inputs tokens + max_new_tokens must be <= 8192. Given: 6474 inputs tokens and 2047 max_new_tokens"

Using the meta-llama/Meta-Llama-3-70B-Instruct model.

@datoreviol @bocahpekael99 if one of you could share one of the conversations where this happen, that would help us a lot with debugging !

@datoreviol @bocahpekael99

Hi guys,

LLMs have a limited context window, that is, a limited amount of text they can process at once. If this limit is exceeded, you typically get the error you are seeing. The limit in your case is around 16k tokens.

What counts towards this limit is the input text PLUS the output text. The input text is your prompt, which may contain a lot of tokens if you are doing RAG (almost 15k in your case). The output text is what you are asking the LLM to generate as an answer, which is 3072 tokens in your case. So you are basically asking the model to process more text at once than it is able to.

To fix the error, you have to reduce the amount of text you are asking the LLM to process. You can use any of these approaches:

  • reduce the input size (write a shorter prompt, return fewer chunks from your database if you are doing RAG, have smaller chunks to begin with)
  • reduce the size of the answer you want the LLM to write. 3072 tokens is kind of a lot for a chatbot or a RAG pipeline, do you really need that much? Try 1024 or 512.
  • when calling an LLM through some kind of free API from HuggingFace, it seems that the max context window is set to a lower value than what the model can actually deal with. If this is how you are using DeepSeek, consider creating a (paid) dedicated endpoint instead, which would allow you to use a bigger context window (Qwen 32B should support 128k).

Hope this helps.

TBH this shouldn't be happening, the backend should automatically truncate if you exceed the context window, that's why I wanted a conversation to see where the issue is

What I don't understand is the following:
When input limit is reached how much do we have to wait in order to continue asking questions to the model / agent ??

Still happening to me

if this message shown to a chat so we have to move on new chat? cause this one doesn't taking more inputs

Iโ€™m using the Google Gemma for a chat and this error has popped upโ€ฆ

Error forwarded from backend: Input validation error: inputs tokens + max_new_tokens must be <= 4096. Given: 4582 inputs tokens and 0 max_new_tokens

Is there a way to fix it to allow the chat to continue or is it dead?

If there is a fix what would I need to do?

I've cleared all my chats and typed L and it is giving me
[Error forwarded from backend: Input validation error: inputs tokens + max_new_tokens must be <= 131072. Given: 227472 inputs tokens and 0 max_new_tokens]
then
L
Yes or No ,
and it gives me
[Error forwarded from backend: Input validation error: inputs tokens + max_new_tokens must be <= 131072. Given: 227476 inputs tokens and 0 max_new_tokens]

:( i've tried app and in browser changing models and its the same all of a sudden.

I RESET MY MODELS AND DELETED MY ASSISTANTS AND ITS WORKING NOW
in a new chat, Any older chats give the same message :(

Sign up or log in to comment