File size: 2,522 Bytes
9426f83
 
09c0149
 
 
 
 
 
 
 
 
9426f83
09c0149
 
 
 
 
 
 
 
 
 
 
 
42557bc
 
09c0149
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1757cc5
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
---

license: bsd-3-clause
language:
  - en
  - zh
base_model:
  - HuggingFaceTB/SmolVLM2-500M-Video-Instruct
pipeline_tag: visual-question-answering
tags:
  - HuggingFaceTB
  - SmolVLM2-500M-Video-Instruct
---


# SmolVLM2-500M-Video-Instruct-Int8

This version of SmolVLM2-500M-Video-Instruct has been converted to run on the Axera NPU using **w8a16** quantization.

Compatible with Pulsar2 version: 4.0

## Convert tools links:

For those who are interested in model conversion, you can try to export axmodel through the original repo:
- https://huggingface.co/HuggingFaceTB/SmolVLM2-500M-Video-Instruct

- [Github for SmolVLM2-500M-Video-Instruct.axera](https://github.com/AXERA-TECH/SmolVLM2-500M-Video-Instruct.axera)

- [Pulsar2 Link, How to Convert LLM from Huggingface to axmodel](https://pulsar2-docs.readthedocs.io/en/latest/appendix/build_llm.html)

## Support Platform
- AX650
  - [M4N-Dock(爱芯派Pro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)

<!-- ## TODO Model infer time -->

## How to use

Download all files from this repository to the device.

**Using AX650 Board**

```bash

ai@ai-bj ~/yongqiang/SmolVLM2-500M-Video-Instruct $ tree -L 1

.

β”œβ”€β”€ assets

β”œβ”€β”€ embeds

β”œβ”€β”€ infer_axmodel.py

β”œβ”€β”€ README.md

β”œβ”€β”€ smolvlm2_axmodel

β”œβ”€β”€ smolvlm2_tokenizer

└── vit_mdoel



5 directories, 2 files

```

#### Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650N DEMO Board

**Multimodal Understanding**

input image

![](assets/bee.jpg)

input text:

```

Can you describe this image?

```

log information:

```bash

ai@ai-bj ~/yongqiang/SmolVLM2-500M-Video-Instruct $ python3 infer_axmodel.py



input prompt: Can you describe this image?



answer >>  The image depicts a close-up view of a pink flower with a bee on it. The bee, which appears to be a bumblebee, is perched on the flower's center, which is surrounded by a cluster of other flowers. The bee is in the process of collecting nectar from the flower, which is a common behavior for bees. The flower itself has a yellow center with a cluster of yellow stamens surrounding it. The petals of the flower are a vibrant shade of pink, and the bee is positioned very close to^@ the camera, making it the focal point of the image. The background of the image is slightly blurred, but it appears to be a garden or a field with other flowers and plants, contributing to the overall natural setting of the image.

```