# PaddleMIX Inference Deployment

[[中文文档](README.md)]

PaddleMIX utilizes Paddle Inference and provides a Python-based deployment solution. There are two deployment methods:

1. **APPflow Deployment**: 
   - By setting the `static_mode = True` variable in APPflow, you can enable static graph inference. Additionally, you can accelerate inference using TensorRT. Note that not all models support static graph or TensorRT. Please refer to the [Multi Modal And Scenario](../applications/README_en.md/#multi-modal-and-scenario) section for specific model support.

2. **Single Model Deployment**: 

For APPflow usage, you can set the `static_mode = True` variable to enable static graph inference and optionally accelerate inference using TensorRT.

### 1.1 Exmaples

```python
>>> from paddlemix.appflow import Appflow
>>> from PIL import Image

>>> task = Appflow(app="openset_det_sam",
                   models=["GroundingDino/groundingdino-swint-ogc","Sam/SamVitH-1024"],
                   static_mode=True,
                   precision="fp32")
>>> image_pil = Image.open("beauty.png").convert("RGB")
>>> result = task(image=image_pil,prompt="women")
```

### 1.2 Parameter Explanation
| Parameter | Required? | Meaning                                                                                          |
|-------|-------|---------------------------------------------------------------------------------------------|
| --app | Yes| Application name                                                                                   |
| --models | Yes | Model(s) used. Can be one model, or multiple models                                                                                    |
| --static_mode  | Optional | Whether to use static graph inference, default to False                                                                                 |
| --precision | Optional | When `static_mode == True`, it defaults to using FP32. You can optionally select `trt_fp32` or `trt_fp16`.                                                                                   |

Instructions：
- Some models do not support static graph or TensorRT. For specific information, please refer to [Multi Modal And Scenario](../applications/README_en.md/#multi-modal-and-scenario).

- The generated static graph will be located in the folder corresponding to the model name, for example: `GroundingDino/groundingdino-swint-ogc/`.

## 2. Single Model Prediction Deployment

Python-based prediction deployment mainly involves two steps:
- Exporting the predictive model
- Performing prediction using Python

Currently supported models:
- [blip2](./blip2/README.md)
- [groundingdino](./groundingdino/README.md)
- [sam](./sam/README.md)
- [qwen_vl](./qwen_vl/README.md)

Using groundingdino as an exmaple.

### 2.1 Exporting Predictive Model

```bash
cd deploy/groundingdino
# 导出groundingdino模型
python export.py \
--dino_type GroundingDino/groundingdino-swint-ogc
```
Will be exported to the following directory, including `model_state.pdiparams`,  `model_state.pdiparams.info`, `model_state.pdmodel`and other files.

### 2.2 Python-based Inference

```bash
 python predict.py  \
 --text_encoder_type GroundingDino/groundingdino-swint-ogc \
 --model_path output_groundingdino/GroundingDino/groundingdino-swint-ogc \
 --input_image https://bj.bcebos.com/v1/paddlenlp/models/community/GroundingDino/000000004505.jpg \
 --output_dir ./groundingdino_predict_output \
 --prompt "bus"

```

## 3. BenchMark

> Note: 
> environment
Paddle 3.0
PaddleMIX release/2.0 
PaddleNLP 2.7.2
A100 80G。

### 3.1 benchmark cmd

Add -- benchmark after running in the 'deploy' corresponding model directory to obtain the running time of the model.
example: GroundingDino benchmark：

```bash
 cd deploy/groundingdino
 python predict.py  \
 --text_encoder_type GroundingDino/groundingdino-swint-ogc \
 --model_path output_groundingdino/GroundingDino/groundingdino-swint-ogc \
 --input_image https://bj.bcebos.com/v1/paddlenlp/models/community/GroundingDino/000000004505.jpg \
 --output_dir ./groundingdino_predict_output \
 --prompt "bus" \
 --benchmark True
```

|Model|image size|dtype |Paddle Deploy |
|-|-|-|-|
|qwen-vl-7b|448*448|fp16|669.8 ms|
|llava-1.5-7b|336*336|fp16|981.2 ms|
|llava-1.6-7b|336*336|fp16|778.7 ms|
|groundingDino/groundingdino-swint-ogc|800*1193|fp32|100 ms|
|Sam/SamVitH-1024|1024*1024|fp32|121 ms|