|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- qihoo360/fg-clip2-base |
|
|
tags: |
|
|
- CLIP |
|
|
- FG-CLIP |
|
|
- FG-CLIP2 |
|
|
- Image-Text Encoder |
|
|
--- |
|
|
|
|
|
# FG-CLIP2 |
|
|
|
|
|
The version of FG-CLIP2 has been converted to run on the Axera NPU using w8a16 quantization. Compatible with Pulsar2 version: 4.2 |
|
|
|
|
|
If you want to know how to convert the FG-CLIP2 model into an axmodel that can run on the axera npu board, please read [this link](https://github.com/Jordan-5i/FG-CLIP/tree/main/ax_tools) in detail. |
|
|
|
|
|
|
|
|
## Support Platform |
|
|
- AX650 |
|
|
|
|
|
## End-of-board inference time |
|
|
| Stage | Time | |
|
|
|------|------| |
|
|
| image_encoder | 125.197 ms | |
|
|
| text_encoder | 10.817 ms | |
|
|
|
|
|
## How to use |
|
|
|
|
|
Download all files from this repository to the device |
|
|
|
|
|
Run the following command: |
|
|
```bash |
|
|
python3 run_axmodel.py |
|
|
``` |
|
|
Model input and output examples are as follows: |
|
|
1. the image you want to input: |
|
|
|
|
|
 |
|
|
|
|
|
2. The description of the image content: |
|
|
|
|
|
```bash |
|
|
[ |
|
|
"一个简约风格的卧室角落,黑色金属衣架上挂着多件米色和白色的衣物,下方架子放着两双浅色鞋子,旁边是一盆绿植,左侧可见一张铺有白色床单和灰色枕头的床。", |
|
|
"一个简约风格的卧室角落,黑色金属衣架上挂着多件红色和蓝色的衣物,下方架子放着两双黑色高跟鞋,旁边是一盆绿植,左侧可见一张铺有白色床单和灰色枕头的床。", |
|
|
"一个简约风格的卧室角落,黑色金属衣架上挂着多件米色和白色的衣物,下方架子放着两双运动鞋,旁边是一盆仙人掌,左侧可见一张铺有白色床单和灰色枕头的床。", |
|
|
"一个繁忙的街头市场,摊位上摆满水果,背景是高楼大厦,人们在喧闹中购物。" |
|
|
] |
|
|
``` |
|
|
|
|
|
3. The similarity between the output of the image encoder and the text encoder is |
|
|
|
|
|
```bash |
|
|
Logits per image: tensor([[9.8757e-01, 4.7755e-03, 7.6510e-03, 1.3484e-14]], dtype=torch.float64) |
|
|
``` |