Are there any other frameworks tested besides transformers that can be deployed?
#5
by
DarrenChen
- opened
Are there any other frameworks tested besides transformers that can be deployed?
For example, vllm, lamma.cpp, sglang, etc. Since I see the released models include both safetensors and gguf, I think lamma.cpp should be able to support them? Also, does your own framework, PowerInfer, have adaptations for this model family?
Yes, you can try our model with the latest llama.cpp and PowerInfer. For other frameworks such as vLLM, we have already submitted a PR, available at https://github.com/vllm-project/vllm/pull/21670.
The paper's abstract suggests this co-designed system largely removes the requirement for costly GPUs, implying CPU optimization. I'm curious if GPUs still see performance gains from its novel network architecture.