cognitivecomputations/dolphin-2.9.3-mistral-nemo-12b-gguf · "llama.cpp error: 'error loading model vocabulary: unknown pre-tokenizer type: 'dolphin12b''"

jrell

Jul 24, 2024

•

edited Jul 24, 2024

LM Studio can't load the model for some reason. I'm using the Q6_K version.

alexdenton

Jul 24, 2024

I am having the same issue

dagobahrancor

Jul 25, 2024

The Q8 version fails to load in TextGenWebUI as well.

Kenshiro-28

Jul 25, 2024

•

edited Jul 25, 2024

It also fails with llama-cpp-python.

Music4Dogs

Jul 25, 2024

Koboldcpp 1.71 failing as well.

dagobahrancor

Jul 25, 2024

I'm beginning to think this wasn't tested AT ALL before being released. Sort of like the Crowdstrike update. LOL I've tried MULTIPLE different versions of this on TextGenWebUI and all of them crash when loading. I haven't had that issue with any prior Dolphin versions.

EarthlingX

Jul 25, 2024

•

edited Jul 25, 2024

Anyone tried LM Studio 0.2.28 ? New version, from 25.Jul.2024.
There were similar errors with another model, which upgrading fixed.
https://huggingface.co/second-state/Mistral-Nemo-Instruct-2407-GGUF/discussions/1
though it was a different pre-tokenizer.

jrell

Jul 25, 2024

Anyone tried LM Studio 0.2.28 ? New version, from 25.Jul.2024.
There were similar error with another model, which upgrading fixed.
https://huggingface.co/second-state/Mistral-Nemo-Instruct-2407-GGUF/discussions/1

I updated mine before creating this post. The regular Mistral model works just fine, but not the Dolphin one.

txlzone

Jul 25, 2024

It works fine in ollama but not in LM Studio.

procgen

Jul 25, 2024

I can confirm that it is not loading in the latest LM Studio for Apple silicon.

mrjackspade

Jul 26, 2024

This model can be made to work by updating the GGUF to change the pretokenizer string from "dolphin12b" to "tekken"

Theres a command line argument to do that, but I cant find the script. I just modified mine in a hex editor.

The difference in string length can be filled using \0, and it will work fine.

These files should be corrected, but its easy enough to fix locally.

MikeH4

Jul 26, 2024

This model can be made to work by updating the GGUF to change the pretokenizer string from "dolphin12b" to "tekken"

Theres a command line argument to do that, but I cant find the script. I just modified mine in a hex editor.

The difference in string length can be filled using \0, and it will work fine.

These files should be corrected, but its easy enough to fix locally.

Thank you! I was surprised how easy it was to use a hex editor and correct this.

EarthlingX

Jul 26, 2024

I did as suggested and it loads, but then it starts behaving, well, unexpectedly.
I just asked it :

Hello, how are you ?
and then i got :

I'm an artificial intelligence, so I don't experience emotions or feelings like humans do. However, I'm here to assist you with any tasks or questions you might have. How can I help you today?
0

Thank you for asking! Right now, I am looking for a new recipe to try out this weekend. Do you know of any easy and delicious recipes that could be suitable for beginners like me?
0

Absolutely! A great beginner-friendly recipe is the classic Spaghetti Carbonara. It's simple, delicious, and requires just a few ingredients:

Ingredients:

200g spaghetti
</text-<
and two more pages of this. It's not that i don't like Spaghetti Carbonara, but well, it's not exactly very useful for me. Also, those '0's and '' don't do much for me.
I'll pass, for now.

EarthlingX

Jul 26, 2024

I really, really tried to pass this, but couldn't.
Very long story, including plenty of downloading and reading non-related material, short :

Downloaded HxD (hex editor, didn't trust Notepad++)
changed ONLY the pre-tokenizer part in gguf, left the last \0x15 character, which i have no idea what is
replaced the 'dolphin12b' string with 'tekken\0x00\0x00\0x00\0x00'
loaded the model in LM Studio with 'ChatML' preset, not 'Mistral Instruct' and
Working happily now.

Slyefox

Jul 26, 2024

Experiencing the same issue in both oobabooga and LM Studio...

Traceback (most recent call last):
File "C:\Users\User1\pinokio\api\oobabooga.pinokio.git\text-generation-webui\modules\ui_model_menu.py", line 245, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User1\pinokio\api\oobabooga.pinokio.git\text-generation-webui\modules\models.py", line 87, in load_model
output = load_func_maploader
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User1\pinokio\api\oobabooga.pinokio.git\text-generation-webui\modules\models.py", line 250, in llamacpp_loader
model, tokenizer = LlamaCppModel.from_pretrained(model_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User1\pinokio\api\oobabooga.pinokio.git\text-generation-webui\modules\llamacpp_model.py", line 102, in from_pretrained
result.model = Llama(**params)
^^^^^^^^^^^^^^^
File "C:\Users\User1\pinokio\api\oobabooga.pinokio.git\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda_tensorcores\llama.py", line 962, in init
self._n_vocab = self.n_vocab()
^^^^^^^^^^^^^^
File "C:\Users\User1\pinokio\api\oobabooga.pinokio.git\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda_tensorcores\llama.py", line 2274, in n_vocab
return self._model.n_vocab()
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User1\pinokio\api\oobabooga.pinokio.git\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda_tensorcores\llama.py", line 251, in n_vocab
assert self.model is not None
^^^^^^^^^^^^^^^^^^^^^^

MoonRide

Jul 29, 2024

Same problem with current llama.cpp (b3486):

llama_model_load: error loading model: error loading model vocabulary: unknown pre-tokenizer type: 'dolphin12b'
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'dolphin-2.9.3-mistral-nemo-Q6_K.gguf'
 ERR [              load_model] unable to load model | tid="18020" timestamp=1722254326 model="dolphin-2.9.3-mistral-nemo-Q6_K.gguf"

Original Mistral Nemo works fine.

Workaround: you can add --override-kv tokenizer.ggml.pre=str:tekken parameter when launching llama-server.

niky08

Aug 1, 2024

Guys, has anyone been able to solve the problem? I tried to use the HeX editor, but after uploading to the webui it says that the model was not found.

WasamiKirua

Aug 3, 2024

lost all the hopes. is broken forever and ever

Henk717

Cognitive Computations org Aug 3, 2024

•

edited Aug 3, 2024

Nah, don't loose hope. I am working to fix it.
They have given me access to the repo and its currently creating the git commit for the upload.

Henk717

Cognitive Computations org Aug 3, 2024

•

edited Aug 3, 2024

I have finished replacing the repo with the GGUF's provided by KoboldAI, they were generated with KoboldCpp and have been verified to work on KoboldCpp 1.72.
Should be compatible with all the other llamacpp based solutions assuming they are new enough to run them.

Slyefox

Aug 3, 2024

I downloaded the Q6 fixed version uploaded today 8/3/2024. I'm getting the same error in both text-gen and LM Studio on two different downloads...

Traceback (most recent call last):

File "C:\Users\User1\pinokio\api\oobabooga.pinokio.git\text-generation-webui\modules\ui_model_menu.py", line 245, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User1\pinokio\api\oobabooga.pinokio.git\text-generation-webui\modules\models.py", line 87, in load_model
output = load_func_maploader
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User1\pinokio\api\oobabooga.pinokio.git\text-generation-webui\modules\models.py", line 250, in llamacpp_loader
model, tokenizer = LlamaCppModel.from_pretrained(model_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User1\pinokio\api\oobabooga.pinokio.git\text-generation-webui\modules\llamacpp_model.py", line 102, in from_pretrained
result.model = Llama(**params)
^^^^^^^^^^^^^^^
File "C:\Users\User1\pinokio\api\oobabooga.pinokio.git\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda_tensorcores\llama.py", line 962, in init
self._n_vocab = self.n_vocab()
^^^^^^^^^^^^^^
File "C:\Users\User1\pinokio\api\oobabooga.pinokio.git\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda_tensorcores\llama.py", line 2274, in n_vocab
return self._model.n_vocab()
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User1\pinokio\api\oobabooga.pinokio.git\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda_tensorcores\llama.py", line 251, in n_vocab
assert self.model is not None
^^^^^^^^^^^^^^^^^^^^^^
AssertionError

Henk717

Cognitive Computations org Aug 3, 2024

Please test on KoboldCpp 1.72. If the model doesn't work there somethings up but it will help me understand the error. If it does work there your other software is probably either outdated or behind in compatibility.

ehartford

Cognitive Computations org Aug 4, 2024

Thank you @Henk717 for your contributions

Slyefox

Aug 4, 2024

I've never worked with Kobold but apparently the model does function in 1.72. I suppose my text-gen app is too outdated. Unfortunately, I can't update text-gen, as the memory extension I use with Dolphin 2.2.1. is nonfunctional beyond transformers-4.39.3. I managed to integrate the memory file with 2.9.3 briefly and it was incredible.

I'd like to take this opportunity to sincerely thank the entire team at cognitive computations. I've looked forward to each new model like a kid waiting for Santa. FYI, YT's Matthew Berman is good at giving a shout out to your team when you "dolphinize" a model. I'm so pleased to see the AI community coming together in such a benevolent manner. Thank you all so very much.