Loading and inference time
#1
by
NEWWWWWbie
- opened
I've been testing a model deployment and noticed that it takes approximately 30 minutes to load and around 40 minutes to get responses. This seems unusually slow, especially when compared to LLaMA 3.1 8B Instruct, which typically loads and responds much faster in similar environments.
Hi,
@NEWWWWWbie
Could you provide the reproducible code and environment?
Because our model is identical to Llama 3.1 8B Instruct except for the parameter values, and we haven’t observed this issue when using it ourselves.