Amazon SageMaker documentation

Quickstart - Deploy Hugging Face Models with SageMaker Jumpstart

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Quickstart - Deploy Hugging Face Models with SageMaker Jumpstart

Why use SageMaker JumpStart for Hugging Face models?

Amazon SageMaker JumpStart lets you deploy the most-popular open Hugging Face models with one click—inside your own AWS account. JumpStart offers a curated selection of model checkpoints for various tasks, including text generation, embeddings, vision, audio, and more. Most models are deployed using the official Hugging Face Deep Learning Containers with a sensible default instance type, so you can move from idea to production in minutes.

In this quickstart guide, we will deploy Qwen/Qwen2.5-14B-Instruct.

1. Prerequisites

Requirement
AWS account with SageMaker enabled An AWS account that will contain all your AWS resources.
An IAM role to access SageMaker AI Learn more about how IAM works with SageMaker AI in this guide.
SageMaker Studio domain and user profile We recommend using SageMaker Studio for straightforward deployment and inference. Follow this guide.
Service quotas Most LLMs need GPU instances (e.g. ml.g5). Verify you have quota for ml.g5.24xlarge or request it.

2· Endpoint deployment

Let’s explain how you would deploy a Hugging Face model to SageMaker browsing through the Jumpstart catalog:

  1. Open SageMaker → JumpStart.
  2. Filter “Hugging Face” or search for your model (e.g. Qwen2.5-14B).
  3. Click Deploy → (optional) adjust instance size / count → Deploy.
  4. Wait until Endpoints shows In service.
  5. Copy the Endpoint name (or ARN) for later use.
JumpStart deployment demo

Alternatively, you can also browse through the Hugging Face Model Hub:

  1. Open the model page → Click Deploy → SageMaker → Jumpstart tab if model is available.
  2. Copy the code snippet and use it from a SageMaker Notebook instance.
JumpStart deployment demo
# SageMaker JumpStart provides APIs as part of SageMaker SDK that allow you to deploy and fine-tune models in network isolation using scripts that SageMaker maintains.

from sagemaker.jumpstart.model import JumpStartModel


model = JumpStartModel(model_id="huggingface-llm-qwen2-5-14b-instruct")
example_payloads = model.retrieve_all_examples()

predictor = model.deploy()

for payload in example_payloads:
    response = predictor.predict(payload.body)
    print("Input:\n", payload.body[payload.prompt_key])
    print("Output:\n", response[0]["generated_text"], "\n\n===============\n")

The endpoint creation can take several minutes, depending on the size of the model.

3. Test interactively

If you deployed through the console, you need to grab the endpoint ARN and reuse in your code.

from sagemaker.predictor import retrieve_default
endpoint_name = "MY ENDPOINT NAME"
predictor = retrieve_default(endpoint_name)
payload = {
    "messages": [
        {
            "role": "system",
            "content": "You are a passionate data scientist."
        },
        {
            "role": "user",
            "content": "what is machine learning?"
        }
    ],
    "max_tokens": 2048,
    "temperature": 0.7,
    "top_p": 0.9,
    "stream": False
}

response = predictor.predict(payload)
print(response)

The endpoint support the Open AI API specification.

4. Clean‑up

To avoid incurring unnecessary costs, when you’re done, delete the SageMaker endpoints in the Deployments → Endpoints console or using the following code snippets:

predictor.delete_model()
predictor.delete_endpoint()
< > Update on GitHub