Deployment to Inference Endpoints
Hi,
I got the same issue,
@stmackcat , did you resolve it?
this issue needs to get attention...
Yep, this is very urgent. Can't deploy it on Sagemaker. Any workaround?
Any update on this issue?
Hi all! Thanks for reporting and very sorry for the wait. We are working on a fix for easy deployment of meta-llama/Meta-Llama-3.1-8B-Instruct in Inference Endpoints -- in the meantime, please ensure the container URI points to ghcr.io/huggingface/text-generation-inference:2.2.0, the latest version of TGI. For example, if your Endpoint is already created and failed status, you can change the container URI in the UI in the settings of the Endpoint under Container Configuration and selecting 'Custom'.
If you're deploying a new endpoint, this can be accomplished under the 'Advanced Configuration' tab here: https://ui.endpoints.huggingface.co/new. You'll select 'Custom' here as well, then update the container URI to ghcr.io/huggingface/text-generation-inference:2.2.0.
You'll also want to pass this env variable to the endpoint: MODEL_ID=/repository, which will allow you to use test the model using the widget once successfully deployed. I'm attaching a screenshot just in case.
We're actively working on a fix for easier deployment of this model, but in the meantime please let me know if you have additional questions!