Unable to serve using OpenVINO Model Server

by KYLN24 - opened Apr 28

Apr 28

I used the model server docker image (openvino/model_server:2025.1-gpu) to serve this model and get the following error:

[2025-04-28 16:54:19.976][64][modelmanager][error][modelinstance.cpp:842] Cannot compile model into target device; error: Exception from src/inference/src/cpp/core.cpp:109:
Exception from src/inference/src/dev/plugin.cpp:53:
Exception from src/plugins/intel_gpu/src/plugin/program_builder.cpp:225:
Operation: VocabDecoder_136799 of type VocabDecoder(extension) is not supported

I also created an issue on GitHub: https://github.com/openvinotoolkit/model_server/issues/3263

Echo9Zulu

OpenVINO Toolkit org Apr 28

•

edited Apr 28

OVMS usecases are often production based. Maybe this won't work for you, but with my project OpenArc I haven't had issues running this model (or the other deepseek distils I have converted) on GPU.

https://github.com/SearchSavior/OpenArc

KYLN24

Apr 29

OVMS usecases are often production based. Maybe this won't work for you, but with my project OpenArc I haven't had issues running this model (or the other deepseek distils I have converted) on GPU.

https://github.com/SearchSavior/OpenArc

Looks cool. I will give it a try. Thanks.

dtrawins

Apr 29

Here is documented procedure how LLM models can be deployed in OVMS https://docs.openvino.ai/nightly/model-server/ovms_docs_llm_quickstart.html

Echo9Zulu

OpenVINO Toolkit org Apr 29

OVMS usecases are often production based. Maybe this won't work for you, but with my project OpenArc I haven't had issues running this model (or the other deepseek distils I have converted) on GPU.

https://github.com/SearchSavior/OpenArc

Looks cool. I will give it a try. Thanks.

Nice! Be sure to join the discord linked in the repo

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment