Should Phi-3V provide support in llama.cpp?
I've tried to convert the Phi-3V model HF file to GGUF via llama.cpp. But looks like the convert-hf-to-gguf.py only supports Phi-3 as "Phi3ForCausalLM". I copied the code, changed the new section to "Phi3VForCausalLM" and copied "tokenizer.model" from Phi-3-128k-instruct. Now the llama.cpp converting the model like below:
...
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:Set model tokenizer
INFO:gguf.vocab:Setting special token type bos to 1
INFO:gguf.vocab:Setting special token type eos to 32000
INFO:gguf.vocab:Setting special token type unk to 0
INFO:gguf.vocab:Setting special token type pad to 32000
INFO:gguf.vocab:Setting add_bos_token to True
INFO:gguf.vocab:Setting add_eos_token to False
INFO:gguf.vocab:Setting chat_template to {% for message in messages %}{{'<|' + message['role'] + '|>' + '
' + message['content'] + '<|end|>
' }}{% endfor %}{% if add_generation_prompt and messages[-1]['role'] != 'assistant' %}{{- '<|assistant|>
' -}}{% endif %}
INFO:hf-to-gguf:Exporting model to 'converted.bin'
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-00002.safetensors'
...
But the process terminated with this error:
...
Traceback (most recent call last):
File "llm/llama.cpp/convert-hf-to-gguf-v.py", line 2778, in
main()
File "llm/llama.cpp/convert-hf-to-gguf-v.py", line 2772, in main
model_instance.write()
File "llm/llama.cpp/convert-hf-to-gguf-v.py", line 328, in write
self.write_tensors()
File "llm/llama.cpp/convert-hf-to-gguf-v.py", line 264, in write_tensors
for new_name, data in ((n, d.squeeze().numpy()) for n, d in self.modify_tensors(data_torch, name, bid)):
File "llm/llama.cpp/convert-hf-to-gguf-v.py", line 231, in modify_tensors
return [(self.map_tensor_name(name), data_torch)]
File "llm/llama.cpp/convert-hf-to-gguf-v.py", line 182, in map_tensor_name
raise ValueError(f"Can not map tensor {name!r}")
ValueError: Can not map tensor 'model.vision_embed_tokens.glb_GN'
It looks like the current convert-hf-to-gguf.py can't map the tensor of the vision embedding to a new file. How can I fix it? Could you please provide help to convert Phi-3v HF model files to GGUF files?
Thank you very much!
After open tensor_mapping.py, I found lots of MODEL_TENSOR definitions in the dict. But I am not sure the vision model in the Phii-3v like "model.vision_embed_tokens" should use which type of model tensor.
I think the even the pure text version of phi-3-128k-instruct is not supported by llama.cpp yet for its LongRoPE module.
Are there any official quants of vision models? The closest I can find is a handful of llava quants.
By now llama.cpp is such a important inference engine, I think foundation models like these from big corps should also come with a proper PR into llama.cpp.
llama.cpp (so also ollama and all other derivates) do not support phi-v at this point, that's because they use a different sort of projector and a different image preprocessing.
Demo inference works if you use an unofficial PR but it doesn't correclty preprocess the images and doesn't match the reference projector.
It's just too much work for the small multimodal developer community to catch up, Microsoft would need to lend a hand.
By now llama.cpp is such a important inference engine, I think foundation models like these from big corps should also come with a proper PR into llama.cpp.
llama.cpp (so also ollama and all other derivates) do not support phi-v at this point, that's because they use a different sort of projector and a different image preprocessing.
Demo inference works if you use an unofficial PR but it doesn't correclty preprocess the images and doesn't match the reference projector.It's just too much work for the small multimodal developer community to catch up, Microsoft would need to lend a hand.
Agree. Even SLMs like Phi-3/Phi-3.5 can run on ONNX runtime but still do not support open-source platforms like llama.cpp
. Some Devices/OS not supported in time like Mac OS (like x86 arch) can not get the ONNX runtime (like onnxruntime
, onnxruntime-genai
python lib, etc.). The techs like LongLoPE
, Falsh attention
, CLIP ViT
projector, etc need to be supported by open-source platforms like llama.cpp
ASAP. Microsoft needs to give hands to open-source communities to make it happen.