The tokenizer has changed just fyi

#2
by bullerwins - opened

That link leads to something...Forbidden. 404.

Hmm. Tokenizer. This Llama 3.1 70B instruct exl2 here came with its own tokenizer. Do I have to do anything special to get it to work with Ooba?

Maybe you are gated from the repo?

image.png

Hmm. Tokenizer. This Llama 3.1 70B instruct exl2 here came with its own tokenizer. Do I have to do anything special to get it to work with Ooba?

To make it work with ooba you need to update the exllamav2, turboderp has just updated the main branch to support it. I tested yesterday with the dev branch and was working fine ( i used tabbyapi though)

OK, thanks.

Waiting for access, has anyone rehosted it yet?

Waiting for access, has anyone rehosted it yet?

https://huggingface.co/SillyTilly

That has the new tokenizer? They just gave me access and it was only updated a few hours ago. Some changes related to BOS and special tokens map only.

That has the new tokenizer? They just gave me access and it was only updated a few hours ago. Some changes related to BOS and special tokens map only.

it does not have it updated

It's been a month and I just stumbled over this, too. So these Llama 3.1 EXL2 quants by @turboderp , and also those by @LoneStriker , don't have the updated tokenizer. Only @bullerwins quants have been updated with it.

So should we consider these older quants obsolete, will they get updated, or is it actually not an issue? I'm sure most of us would prefer to run the best possible version of Llama 3.1 so what's the consensus here?

It's been a month and I just stumbled over this, too. So these Llama 3.1 EXL2 quants by @turboderp , and also those by @LoneStriker , don't have the updated tokenizer. Only @bullerwins quants have been updated with it.

So should we consider these older quants obsolete, will they get updated, or is it actually not an issue? I'm sure most of us would prefer to run the best possible version of Llama 3.1 so what's the consensus here?

I believe it does matter. I haven't run benchmarks with exl2 but with the GGUF I did A/B with and without the fixed tokenizer, and testing with mmlu pro benchmarks got consistent better results with the fixed one.

I actually just heard the last weeks thursai podcast and notice the 405B model got updated and the nous team had to retrain. I checked and it only affected the 405B, so 8B and 70B models with the fixed tokenizer and chat templates would be the best. The chat template for the exl2 doesn't need requant, just update the tokenizer_config.json.

Note: the chat template has gotten 2 updates, my models have the first update but not the second one. The second one is related to tool calling, so it won't matter if you don't use it. I'll update my models today I can hit you up here or in twitter if you want.

I believe that is all

@bullerwins Thanks for the info! And yes, please let me know when you've updated, so I can update my local copies. I'd rather stay up to date now than have issues later.

Feel free to contact me on Twitter, too, I'll follow you (if I don't already) and retweet your update note. Always good to spread useful information.

I just pull the new tokenizers in and replace them. Seems to work fine.

Sign up or log in to comment