AXERA-TECH
/

FG-CLIP

Image-Text Encoder

Model card Files Files and versions

FG-CLIP / README.md

jordan0811's picture

Create README.md

0315ad2 verified 17 days ago

|

history blame contribute delete

1.94 kB

	---
	license: apache-2.0
	language:
	- en
	base_model:
	- qihoo360/fg-clip2-base
	tags:
	- CLIP
	- FG-CLIP
	- FG-CLIP2
	- Image-Text Encoder
	---

	# FG-CLIP2

	The version of FG-CLIP2 has been converted to run on the Axera NPU using w8a16 quantization. Compatible with Pulsar2 version: 4.2

	If you want to know how to convert the FG-CLIP2 model into an axmodel that can run on the axera npu board, please read [this link](https://github.com/Jordan-5i/FG-CLIP/tree/main/ax_tools) in detail.


	## Support Platform
	- AX650

	## End-of-board inference time
	\| Stage \| Time \|
	\|------\|------\|
	\| image_encoder \| 125.197 ms \|
	\| text_encoder \| 10.817 ms \|

	## How to use

	Download all files from this repository to the device

	Run the following command:
	```bash
	python3 run_axmodel.py
	```
	Model input and output examples are as follows:
	1. the image you want to input:

	![](bedroom.jpg)

	2. The description of the image content:

	```bash
	[
	"一个简约风格的卧室角落，黑色金属衣架上挂着多件米色和白色的衣物，下方架子放着两双浅色鞋子，旁边是一盆绿植，左侧可见一张铺有白色床单和灰色枕头的床。",
	"一个简约风格的卧室角落，黑色金属衣架上挂着多件红色和蓝色的衣物，下方架子放着两双黑色高跟鞋，旁边是一盆绿植，左侧可见一张铺有白色床单和灰色枕头的床。",
	"一个简约风格的卧室角落，黑色金属衣架上挂着多件米色和白色的衣物，下方架子放着两双运动鞋，旁边是一盆仙人掌，左侧可见一张铺有白色床单和灰色枕头的床。",
	"一个繁忙的街头市场，摊位上摆满水果，背景是高楼大厦，人们在喧闹中购物。"
	]
	```

	3. The similarity between the output of the image encoder and the text encoder is

	```bash
	Logits per image: tensor([[9.8757e-01, 4.7755e-03, 7.6510e-03, 1.3484e-14]], dtype=torch.float64)
	```