Unable to serve using OpenVINO Model Server

#1
by KYLN24 - opened

I used the model server docker image (openvino/model_server:2025.1-gpu) to serve this model and get the following error:

[2025-04-28 16:54:19.976][64][modelmanager][error][modelinstance.cpp:842] Cannot compile model into target device; error: Exception from src/inference/src/cpp/core.cpp:109:
Exception from src/inference/src/dev/plugin.cpp:53:
Exception from src/plugins/intel_gpu/src/plugin/program_builder.cpp:225:
Operation: VocabDecoder_136799 of type VocabDecoder(extension) is not supported

I also created an issue on GitHub: https://github.com/openvinotoolkit/model_server/issues/3263

OpenVINO Toolkit org
edited 8 days ago

OVMS usecases are often production based. Maybe this won't work for you, but with my project OpenArc I haven't had issues running this model (or the other deepseek distils I have converted) on GPU.

https://github.com/SearchSavior/OpenArc

OVMS usecases are often production based. Maybe this won't work for you, but with my project OpenArc I haven't had issues running this model (or the other deepseek distils I have converted) on GPU.

https://github.com/SearchSavior/OpenArc

Looks cool. I will give it a try. Thanks.

Here is documented procedure how LLM models can be deployed in OVMS https://docs.openvino.ai/nightly/model-server/ovms_docs_llm_quickstart.html

OpenVINO Toolkit org

OVMS usecases are often production based. Maybe this won't work for you, but with my project OpenArc I haven't had issues running this model (or the other deepseek distils I have converted) on GPU.

https://github.com/SearchSavior/OpenArc

Looks cool. I will give it a try. Thanks.

Nice! Be sure to join the discord linked in the repo

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment