meta-llama/Llama-3.1-8B-Instruct · Deployment to Inference Endpoints

stcat

Jul 24, 2024

This comment has been hidden (marked as Resolved)

promios

Jul 27, 2024

Hi,
I got the same issue,
@stmackcat , did you resolve it?

stcat

Jul 27, 2024

Hello @promios . Unfortunately, no, I have tried 3.1 8B, 70B across various Inference Endpoint configurations, all failed with similar messages.

oscardong

Jul 28, 2024

this issue needs to get attention...

axs531622

Jul 29, 2024

Yep, this is very urgent. Can't deploy it on Sagemaker. Any workaround?

poorav

Jul 29, 2024

Any update on this issue?

meganariley

Jul 30, 2024

Hi all! Thanks for reporting and very sorry for the wait. We are working on a fix for easy deployment of meta-llama/Meta-Llama-3.1-8B-Instruct in Inference Endpoints -- in the meantime, please ensure the container URI points to ghcr.io/huggingface/text-generation-inference:2.2.0, the latest version of TGI. For example, if your Endpoint is already created and failed status, you can change the container URI in the UI in the settings of the Endpoint under Container Configuration and selecting 'Custom'.

If you're deploying a new endpoint, this can be accomplished under the 'Advanced Configuration' tab here: https://ui.endpoints.huggingface.co/new. You'll select 'Custom' here as well, then update the container URI to ghcr.io/huggingface/text-generation-inference:2.2.0.

You'll also want to pass this env variable to the endpoint: MODEL_ID=/repository, which will allow you to use test the model using the widget once successfully deployed. I'm attaching a screenshot just in case.

We're actively working on a fix for easier deployment of this model, but in the meantime please let me know if you have additional questions!