Unable to deploy model with huggingface tei

#2
by dhruv-wrk - opened

Hi, has anyone been able to deploy this with huggingface tei on Sagemaker? I am trying to see how to use this in Sagemaker and do the sparse embedding computation through the endpoint

opensearch-project org

Hi @dhruv-wrk , I'm not quite familiar with HF tei. will take a look into it. Before we run to a solution, you can try this tutorial to get it deployed on sagemaker https://github.com/opensearch-project/ml-commons/blob/main/docs/model_serving_framework/deploy_sparse_model_to_SageMaker.ipynb

I am facing the same issue as @dhruv-wrk .

Code as given on the model card

import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

# Hub Model configuration. https://huggingface.co/models
hub = {
    'HF_MODEL_ID':'opensearch-project/opensearch-neural-sparse-encoding-doc-v3-gte'
}


# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    image_uri=get_huggingface_llm_image_uri("huggingface-tei",version="1.8.2"),
    env=hub,
    role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.2xlarge",
  )
  
# send request
predictor.predict({
    "inputs": "My name is Clara and I am",
})
opensearch-project org

Hi @dhruv-wrk @vishva399 ,

To use SPLADE pooling in TEI, we need to apply one change to @vishva399 's code. I.e. add a "POOLING" field to env.

hub = {
    'HF_MODEL_ID':'opensearch-project/opensearch-neural-sparse-encoding-doc-v2-mini',
    "POOLING": "splade",
}

Furthermore, the TEI's pooling logics are hard coded https://github.com/huggingface/text-embeddings-inference/blob/9ef569d83083afa30784223d0a0352229d094898/backends/python/server/text_embeddings_server/models/pooling.py#L38 And for v3-series, we're using log1p_relu, which is different from TEI's implementation. So we'd recommend to use v2/v1 series models with TEI.

opensearch-project org

And to support the new pooling options, we need to create issues or PRs to huggingface/text-embeddings-inference repo.

Thank you for your quick response it was very helpful

Sign up or log in to comment