Unable to serve using OpenVINO Model Server
I used the model server docker image (openvino/model_server:2025.1-gpu) to serve this model and get the following error:
[2025-04-28 16:54:19.976][64][modelmanager][error][modelinstance.cpp:842] Cannot compile model into target device; error: Exception from src/inference/src/cpp/core.cpp:109:
Exception from src/inference/src/dev/plugin.cpp:53:
Exception from src/plugins/intel_gpu/src/plugin/program_builder.cpp:225:
Operation: VocabDecoder_136799 of type VocabDecoder(extension) is not supported
I also created an issue on GitHub: https://github.com/openvinotoolkit/model_server/issues/3263
OVMS usecases are often production based. Maybe this won't work for you, but with my project OpenArc I haven't had issues running this model (or the other deepseek distils I have converted) on GPU.
OVMS usecases are often production based. Maybe this won't work for you, but with my project OpenArc I haven't had issues running this model (or the other deepseek distils I have converted) on GPU.
Looks cool. I will give it a try. Thanks.
Here is documented procedure how LLM models can be deployed in OVMS https://docs.openvino.ai/nightly/model-server/ovms_docs_llm_quickstart.html
OVMS usecases are often production based. Maybe this won't work for you, but with my project OpenArc I haven't had issues running this model (or the other deepseek distils I have converted) on GPU.
Looks cool. I will give it a try. Thanks.
Nice! Be sure to join the discord linked in the repo