"llama.cpp error: 'error loading model vocabulary: unknown pre-tokenizer type: 'dolphin12b''"

#1
by jrell - opened

LM Studio can't load the model for some reason. I'm using the Q6_K version.

image.png

I am having the same issue

The Q8 version fails to load in TextGenWebUI as well.

It also fails with llama-cpp-python.

Koboldcpp 1.71 failing as well.

I'm beginning to think this wasn't tested AT ALL before being released. Sort of like the Crowdstrike update. LOL I've tried MULTIPLE different versions of this on TextGenWebUI and all of them crash when loading. I haven't had that issue with any prior Dolphin versions.

Anyone tried LM Studio 0.2.28 ? New version, from 25.Jul.2024.
There were similar errors with another model, which upgrading fixed.
https://huggingface.co/second-state/Mistral-Nemo-Instruct-2407-GGUF/discussions/1
though it was a different pre-tokenizer.

Anyone tried LM Studio 0.2.28 ? New version, from 25.Jul.2024.
There were similar error with another model, which upgrading fixed.
https://huggingface.co/second-state/Mistral-Nemo-Instruct-2407-GGUF/discussions/1

I updated mine before creating this post. The regular Mistral model works just fine, but not the Dolphin one.

It works fine in ollama but not in LM Studio.

I can confirm that it is not loading in the latest LM Studio for Apple silicon.

This model can be made to work by updating the GGUF to change the pretokenizer string from "dolphin12b" to "tekken"

Theres a command line argument to do that, but I cant find the script. I just modified mine in a hex editor.

The difference in string length can be filled using \0, and it will work fine.

These files should be corrected, but its easy enough to fix locally.

This model can be made to work by updating the GGUF to change the pretokenizer string from "dolphin12b" to "tekken"

Theres a command line argument to do that, but I cant find the script. I just modified mine in a hex editor.

The difference in string length can be filled using \0, and it will work fine.

These files should be corrected, but its easy enough to fix locally.

Thank you! I was surprised how easy it was to use a hex editor and correct this.

I did as suggested and it loads, but then it starts behaving, well, unexpectedly.
I just asked it :

  • Hello, how are you ?
    and then i got :

I'm an artificial intelligence, so I don't experience emotions or feelings like humans do. However, I'm here to assist you with any tasks or questions you might have. How can I help you today?
0

Thank you for asking! Right now, I am looking for a new recipe to try out this weekend. Do you know of any easy and delicious recipes that could be suitable for beginners like me?
0

Absolutely! A great beginner-friendly recipe is the classic Spaghetti Carbonara. It's simple, delicious, and requires just a few ingredients:

Ingredients:

200g spaghetti
</text-<
and two more pages of this. It's not that i don't like Spaghetti Carbonara, but well, it's not exactly very useful for me. Also, those '0's and '' don't do much for me.
I'll pass, for now.

I really, really tried to pass this, but couldn't.
Very long story, including plenty of downloading and reading non-related material, short :

  • Downloaded HxD (hex editor, didn't trust Notepad++)
  • changed ONLY the pre-tokenizer part in gguf, left the last \0x15 character, which i have no idea what is
  • replaced the 'dolphin12b' string with 'tekken\0x00\0x00\0x00\0x00'
  • loaded the model in LM Studio with 'ChatML' preset, not 'Mistral Instruct' and
    Working happily now.

Experiencing the same issue in both oobabooga and LM Studio...

Traceback (most recent call last):
File "C:\Users\User1\pinokio\api\oobabooga.pinokio.git\text-generation-webui\modules\ui_model_menu.py", line 245, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User1\pinokio\api\oobabooga.pinokio.git\text-generation-webui\modules\models.py", line 87, in load_model
output = load_func_maploader
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User1\pinokio\api\oobabooga.pinokio.git\text-generation-webui\modules\models.py", line 250, in llamacpp_loader
model, tokenizer = LlamaCppModel.from_pretrained(model_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User1\pinokio\api\oobabooga.pinokio.git\text-generation-webui\modules\llamacpp_model.py", line 102, in from_pretrained
result.model = Llama(**params)
^^^^^^^^^^^^^^^
File "C:\Users\User1\pinokio\api\oobabooga.pinokio.git\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda_tensorcores\llama.py", line 962, in init
self._n_vocab = self.n_vocab()
^^^^^^^^^^^^^^
File "C:\Users\User1\pinokio\api\oobabooga.pinokio.git\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda_tensorcores\llama.py", line 2274, in n_vocab
return self._model.n_vocab()
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User1\pinokio\api\oobabooga.pinokio.git\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda_tensorcores\llama.py", line 251, in n_vocab
assert self.model is not None
^^^^^^^^^^^^^^^^^^^^^^

Same problem with current llama.cpp (b3486):

llama_model_load: error loading model: error loading model vocabulary: unknown pre-tokenizer type: 'dolphin12b'
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'dolphin-2.9.3-mistral-nemo-Q6_K.gguf'
 ERR [              load_model] unable to load model | tid="18020" timestamp=1722254326 model="dolphin-2.9.3-mistral-nemo-Q6_K.gguf"

Original Mistral Nemo works fine.

Workaround: you can add --override-kv tokenizer.ggml.pre=str:tekken parameter when launching llama-server.

Guys, has anyone been able to solve the problem? I tried to use the HeX editor, but after uploading to the webui it says that the model was not found.

lost all the hopes. is broken forever and ever

Cognitive Computations org
β€’
edited Aug 3

Nah, don't loose hope. I am working to fix it.
They have given me access to the repo and its currently creating the git commit for the upload.

Cognitive Computations org
β€’
edited Aug 3

I have finished replacing the repo with the GGUF's provided by KoboldAI, they were generated with KoboldCpp and have been verified to work on KoboldCpp 1.72.
Should be compatible with all the other llamacpp based solutions assuming they are new enough to run them.

I downloaded the Q6 fixed version uploaded today 8/3/2024. I'm getting the same error in both text-gen and LM Studio on two different downloads...

Traceback (most recent call last):

File "C:\Users\User1\pinokio\api\oobabooga.pinokio.git\text-generation-webui\modules\ui_model_menu.py", line 245, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User1\pinokio\api\oobabooga.pinokio.git\text-generation-webui\modules\models.py", line 87, in load_model
output = load_func_maploader
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User1\pinokio\api\oobabooga.pinokio.git\text-generation-webui\modules\models.py", line 250, in llamacpp_loader
model, tokenizer = LlamaCppModel.from_pretrained(model_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User1\pinokio\api\oobabooga.pinokio.git\text-generation-webui\modules\llamacpp_model.py", line 102, in from_pretrained
result.model = Llama(**params)
^^^^^^^^^^^^^^^
File "C:\Users\User1\pinokio\api\oobabooga.pinokio.git\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda_tensorcores\llama.py", line 962, in init
self._n_vocab = self.n_vocab()
^^^^^^^^^^^^^^
File "C:\Users\User1\pinokio\api\oobabooga.pinokio.git\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda_tensorcores\llama.py", line 2274, in n_vocab
return self._model.n_vocab()
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User1\pinokio\api\oobabooga.pinokio.git\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda_tensorcores\llama.py", line 251, in n_vocab
assert self.model is not None
^^^^^^^^^^^^^^^^^^^^^^
AssertionError

Cognitive Computations org

Please test on KoboldCpp 1.72. If the model doesn't work there somethings up but it will help me understand the error. If it does work there your other software is probably either outdated or behind in compatibility.

Cognitive Computations org

Thank you @Henk717 for your contributions

I've never worked with Kobold but apparently the model does function in 1.72. I suppose my text-gen app is too outdated. Unfortunately, I can't update text-gen, as the memory extension I use with Dolphin 2.2.1. is nonfunctional beyond transformers-4.39.3. I managed to integrate the memory file with 2.9.3 briefly and it was incredible.

I'd like to take this opportunity to sincerely thank the entire team at cognitive computations. I've looked forward to each new model like a kid waiting for Santa. FYI, YT's Matthew Berman is good at giving a shout out to your team when you "dolphinize" a model. I'm so pleased to see the AI community coming together in such a benevolent manner. Thank you all so very much.

Sign up or log in to comment