Phind/Phind-CodeLlama-34B-v2 · Sagemaker deployment

I tried deploying the model using bitsandbytes-nf4 quantization technique on g5.4xlarge instance,

while invoking the endpoint with the following payload
payload = {
"inputs": prompt,
"parameters": {
"top_p": tp,
"temperature": tmp,
"top_k": 50,
"max_new_tokens": 1024,
"repetition_penalty": 1.03,
"stop": [""]
}
}
its giving the error
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from primary with message "Your invocation timed out while waiting for a response from container primary. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again.". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/code-llama-34b in account 346347345 for more information.

Can you please provide any documentation or code repo where I get the deployment code?

Thank you