AXERA-TECH
/

FG-CLIP

Image-Text Encoder

Model card Files Files and versions

jordan0811 commited on 17 days ago

Commit

0315ad2

·

verified ·

1 Parent(s): 7de6efb

Create README.md

Files changed (1) hide show

README.md +58 -0

README.md ADDED Viewed

	@@ -0,0 +1,58 @@

+---
+license: apache-2.0
+language:
+- en
+base_model:
+- qihoo360/fg-clip2-base
+tags:
+- CLIP
+- FG-CLIP
+- FG-CLIP2
+- Image-Text Encoder
+---
+# FG-CLIP2
+The version of FG-CLIP2 has been converted to run on the Axera NPU using w8a16 quantization. Compatible with Pulsar2 version: 4.2
+If you want to know how to convert the FG-CLIP2 model into an axmodel that can run on the axera npu board, please read [this link](https://github.com/Jordan-5i/FG-CLIP/tree/main/ax_tools) in detail.
+## Support Platform
+- AX650
+## End-of-board inference time
+  | Stage | Time |
+  |------|------|
+  | image_encoder |  125.197 ms  |
+  | text_encoder |  10.817 ms  |
+## How to use
+Download all files from this repository to the device
+Run the following command:
+```bash
+python3 run_axmodel.py
+```
+Model input and output examples are as follows:
+1. the image you want to input:
+   ![](bedroom.jpg)
+2. The description of the image content:
+```bash
+   [
+    "一个简约风格的卧室角落，黑色金属衣架上挂着多件米色和白色的衣物，下方架子放着两双浅色鞋子，旁边是一盆绿植，左侧可见一张铺有白色床单和灰色枕头的床。",
+    "一个简约风格的卧室角落，黑色金属衣架上挂着多件红色和蓝色的衣物，下方架子放着两双黑色高跟鞋，旁边是一盆绿植，左侧可见一张铺有白色床单和灰色枕头的床。",
+    "一个简约风格的卧室角落，黑色金属衣架上挂着多件米色和白色的衣物，下方架子放着两双运动鞋，旁边是一盆仙人掌，左侧可见一张铺有白色床单和灰色枕头的床。",
+    "一个繁忙的街头市场，摊位上摆满水果，背景是高楼大厦，人们在喧闹中购物。"
+  ]
+```
+3. The similarity between the output of the image encoder and the text encoder is
+   ```bash
+   Logits per image: tensor([[9.8757e-01, 4.7755e-03, 7.6510e-03, 1.3484e-14]], dtype=torch.float64)
+   ```