Sagemaker deployment
I tried deploying the model using bitsandbytes-nf4 quantization technique on g5.4xlarge instance,
while invoking the endpoint with the following payload
    payload = {
      "inputs":  prompt,
      "parameters": {
        "top_p": tp,
        "temperature": tmp,
        "top_k": 50,
        "max_new_tokens": 1024,
        "repetition_penalty": 1.03,
        "stop": [""]
      }
    }
its giving the error
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from primary with message "Your invocation timed out while waiting for a response from container primary. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again.". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/code-llama-34b in account 346347345 for more information.
Can you please provide any documentation or code repo where I get the deployment code?
Thank you
