FG-CLIP2

The version of FG-CLIP2 has been converted to run on the Axera NPU using w8a16 quantization. Compatible with Pulsar2 version: 4.2

If you want to know how to convert the FG-CLIP2 model into an axmodel that can run on the axera npu board, please read this link in detail.

Support Platform

AX650

End-of-board inference time

Stage	Time
image_encoder	125.197 ms
text_encoder	10.817 ms

How to use

Download all files from this repository to the device

Run the following command:

python3 run_axmodel.py

Model input and output examples are as follows:

the image you want to input:
The description of the image content:

   [
    "一个简约风格的卧室角落，黑色金属衣架上挂着多件米色和白色的衣物，下方架子放着两双浅色鞋子，旁边是一盆绿植，左侧可见一张铺有白色床单和灰色枕头的床。",
    "一个简约风格的卧室角落，黑色金属衣架上挂着多件红色和蓝色的衣物，下方架子放着两双黑色高跟鞋，旁边是一盆绿植，左侧可见一张铺有白色床单和灰色枕头的床。",
    "一个简约风格的卧室角落，黑色金属衣架上挂着多件米色和白色的衣物，下方架子放着两双运动鞋，旁边是一盆仙人掌，左侧可见一张铺有白色床单和灰色枕头的床。",
    "一个繁忙的街头市场，摊位上摆满水果，背景是高楼大厦，人们在喧闹中购物。"
  ]

The similarity between the output of the image encoder and the text encoder is

Logits per image: tensor([[9.8757e-01, 4.7755e-03, 7.6510e-03, 1.3484e-14]], dtype=torch.float64)

Downloads last month: 6

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AXERA-TECH/FG-CLIP

Base model

qihoo360/fg-clip2-base

Finetuned

(1)

this model

Collection including AXERA-TECH/FG-CLIP

Multimodal Models

Collection

25 items • Updated 10 days ago • 1