Deployment to Inference Endpoints

#34
by stcat - opened
This comment has been hidden (marked as Resolved)

Hi,
I got the same issue,
@stmackcat , did you resolve it?

Hello @promios . Unfortunately, no, I have tried 3.1 8B, 70B across various Inference Endpoint configurations, all failed with similar messages.

this issue needs to get attention...

Yep, this is very urgent. Can't deploy it on Sagemaker. Any workaround?

Any update on this issue?

Hi all! Thanks for reporting and very sorry for the wait. We are working on a fix for easy deployment of meta-llama/Meta-Llama-3.1-8B-Instruct in Inference Endpoints -- in the meantime, please ensure the container URI points to ghcr.io/huggingface/text-generation-inference:2.2.0, the latest version of TGI. For example, if your Endpoint is already created and failed status, you can change the container URI in the UI in the settings of the Endpoint under Container Configuration and selecting 'Custom'.
TGI Container Configuration.png

If you're deploying a new endpoint, this can be accomplished under the 'Advanced Configuration' tab here: https://ui.endpoints.huggingface.co/new. You'll select 'Custom' here as well, then update the container URI to ghcr.io/huggingface/text-generation-inference:2.2.0.

You'll also want to pass this env variable to the endpoint: MODEL_ID=/repository, which will allow you to use test the model using the widget once successfully deployed. I'm attaching a screenshot just in case.
Env variable.png

We're actively working on a fix for easier deployment of this model, but in the meantime please let me know if you have additional questions!

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment