Unable to deploy model with huggingface tei

by dhruv-wrk - opened Aug 25

Aug 25

Hi, has anyone been able to deploy this with huggingface tei on Sagemaker? I am trying to see how to use this in Sagemaker and do the sparse embedding computation through the endpoint

zhichao-geng

opensearch-project org Sep 4

Hi @dhruv-wrk , I'm not quite familiar with HF tei. will take a look into it. Before we run to a solution, you can try this tutorial to get it deployed on sagemaker https://github.com/opensearch-project/ml-commons/blob/main/docs/model_serving_framework/deploy_sparse_model_to_SageMaker.ipynb

vishva399

Oct 21

I am facing the same issue as @dhruv-wrk .

Code as given on the model card

import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

# Hub Model configuration. https://huggingface.co/models
hub = {
    'HF_MODEL_ID':'opensearch-project/opensearch-neural-sparse-encoding-doc-v3-gte'
}


# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    image_uri=get_huggingface_llm_image_uri("huggingface-tei",version="1.8.2"),
    env=hub,
    role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.2xlarge",
  )
  
# send request
predictor.predict({
    "inputs": "My name is Clara and I am",
})

zhichao-geng

opensearch-project org Oct 24

Hi @dhruv-wrk @vishva399 ,

To use SPLADE pooling in TEI, we need to apply one change to @vishva399 's code. I.e. add a "POOLING" field to env.

hub = {
    'HF_MODEL_ID':'opensearch-project/opensearch-neural-sparse-encoding-doc-v2-mini',
    "POOLING": "splade",
}

Furthermore, the TEI's pooling logics are hard coded https://github.com/huggingface/text-embeddings-inference/blob/9ef569d83083afa30784223d0a0352229d094898/backends/python/server/text_embeddings_server/models/pooling.py#L38 And for v3-series, we're using log1p_relu, which is different from TEI's implementation. So we'd recommend to use v2/v1 series models with TEI.

zhichao-geng

opensearch-project org Oct 24

And to support the new pooling options, we need to create issues or PRs to huggingface/text-embeddings-inference repo.

vishva399

Oct 24

Thank you for your quick response it was very helpful

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment