Loading and inference time

by NEWWWWWbie - opened 9 days ago

9 days ago

I've been testing a model deployment and noticed that it takes approximately 30 minutes to load and around 40 minutes to get responses. This seems unusually slow, especially when compared to LLaMA 3.1 8B Instruct, which typically loads and responds much faster in similar environments.

youyaoching

Trend Micro (AI Lab) org 8 days ago

Hi, @NEWWWWWbie
Could you provide the reproducible code and environment?
Because our model is identical to Llama 3.1 8B Instruct except for the parameter values, and we haven’t observed this issue when using it ourselves.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment