# Mistral on AWS Inf2 with FastAPI
Use FastAPI to quickly host serving of Mistral model on AWS Inferentia2 instance Inf2 🚀
Support Multimodal input type (input_embeds) 🖼️

![image](https://github.com/davidshtian/Mistral-on-AWS-Inf2-with-FastAPI/assets/14228056/94f8aa15-6851-41d5-b89e-2b8699949fef)


## Environment Setup
Follow the instructions in Neuron docs [Pytorch Neuron Setup](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-setup.html) for basic environment setup. 

## Install Packages
Go to the virtual env and install the extra packages.
```
cd app
pip install -r requirements.txt
```

## Run the App
```
uvicorn main:app --host 0.0.0.0 --port 8000
```

## Send the Request
Test via the input_ids (normal prompt) version:
```
cd client
python client.py
```

Test via the input_embeds (common multimodal input, skip embedding layer) version:
```
cd client
python embeds_client.py
```

## Container
You could build container image using the Dockerfile, or using the pre-build image:
```
docker run --rm --name mistral -d -p 8000:8000 --device=/dev/neuron0 public.ecr.aws/shtian/fastapi-mistral
```