`tokenizer.model` is missing

#30
by happyme531 - opened

As the title said, it seems that all Phi4 models don't have this tokenizer file, despite the architecture being the same as Phi3. But it is needed by some third party deployment framework.

Having a similar problem for a fine-tuned model that uses tokenizer.json.

Microsoft org

@elTigre914 @happyme531
If you have inference errors, please double check the environ and dependencies, this is what we suggest
https://huggingface.co/microsoft/Phi-4-multimodal-instruct#requirements

Alternatively you can look at this dockerfile
https://github.com/anastasiosyal/phi4-multimodal-instruct-server/blob/main/dockerfile

For finetuning you could take a look at Korean finetuning writeup contributed by the community
https://huggingface.co/microsoft/Phi-4-multimodal-instruct#appendix-b-fine-tuning-korean-speech

@elTigre914 @happyme531
If you have inference errors, please double check the environ and dependencies, this is what we suggest
https://huggingface.co/microsoft/Phi-4-multimodal-instruct#requirements

Alternatively you can look at this dockerfile
https://github.com/anastasiosyal/phi4-multimodal-instruct-server/blob/main/dockerfile

For finetuning you could take a look at Korean finetuning writeup contributed by the community
https://huggingface.co/microsoft/Phi-4-multimodal-instruct#appendix-b-fine-tuning-korean-speech

But how can we get a tokenizer.model anyway?

Microsoft org

Hi @happyme531 ,

We are using gpt-4o tokenizer converted from tiktoken.
So, we have tokenizer.json file, but we don't support tokenizer.model.

Can you try to use tokenizers library?

Hi @happyme531 ,

We are using gpt-4o tokenizer converted from tiktoken.
So, we have tokenizer.json file, but we don't support tokenizer.model.

Can you try to use tokenizers library?

tokenizers library works, but there are third party frameworks that rely on tokenizer.model. Is there anyway to generate one using the existing tokenizer?

Microsoft org

@happyme531

tokenizer.json is a recommended way to maintain all the information.

https://huggingface.co/docs/transformers/main/en/tiktoken

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment