File size: 6,401 Bytes

---
license: bsd-3-clause
language:
- en
base_model:
- HuggingFaceTB/SmolVLM-256M-Instruct
tags:
- SmolVLM
- Int8
- VLM
---

<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/SmolVLM_256_banner.png" width="800" height="auto" alt="Image description">

# SmolVLM-256M-Instruct-Int8

This version of SmolVLM-256M-Instruct has been converted to run on the Axera NPU using **w8a16** quantization.

This model has been optimized with the following LoRA: 

Compatible with Pulsar2 version: 3.4-temp

## Convert tools links:

For those who are interested in model conversion, you can try to export axmodel through the original repo : 
https://huggingface.co/HuggingFaceTB/SmolVLM-256M-Instruct

[How to Convert LLM from Huggingface to axmodel](https://github.com/AXERA-TECH/SmolVLM-256M-Instruct.axera) 

[AXera NPU HOST LLM Runtime](https://github.com/techshoww/ax-llm) 


## Support Platform

- AX650
  - AX650N DEMO Board
  - [M4N-Dock(爱芯派Pro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
  - [M.2 Accelerator card](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)
- AX630C
  - [爱芯派2](https://axera-pi-2-docs-cn.readthedocs.io/zh-cn/latest/index.html)
  - [Module-LLM](https://docs.m5stack.com/zh_CN/module/Module-LLM)
  - [LLM630 Compute Kit](https://docs.m5stack.com/zh_CN/core/LLM630%20Compute%20Kit)
 
|Chips|image encoder 512|ttft|w8a16|
|--|--|--|--|
|AX650| 105 ms | 57 ms |80 tokens/sec|
|AX630C| 800 ms | 182 ms |31 tokens/sec|

## How to use

Download all files from this repository to the device

```
root@ax650:/mnt/qtang/llm-test/smolvlm-256m # tree -L 1
.
├── main
├── post_config.json
├── run_smolvlm_ax630c.sh
├── run_smolvlm_ax650.sh
├── smolvlm-256m-ax630c
├── smolvlm-256m-ax650
├── smolvlm_tokenizer
├── smolvlm_tokenizer_512.py
└── ssd_car.jpg
```

#### Install transformer

```
pip install transformers==4.41.1
```

#### Start the Tokenizer service

```
root@ax650:/mnt/qtang/llm-test/smolvlm-256m# python smolvlm_tokenizer_512.py --port 12345
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
1 <|im_start|> 49279 <end_of_utterance>
[1, 11126, 42, 49189, 49152, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190,
49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190,
49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190,
49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190,
49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49189, 7306, 346, 5125, 451, 2443, 47, 49279,
198, 9519, 9531, 42]
81
[1, 11126, 42, 28120, 905, 49279, 198, 9519, 9531, 42]
10
http://localhost:12345
```

#### Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650N DEMO Board

- input text

```
Describe the picture
```

- input image

![](./ssd_car.jpg)

Open another terminal and run `./run_smolvlm_ax650.sh`

```
root@ax650:/mnt/qtang/llm-test/smolvlm-256m# ./run_smolvlm_ax650.sh
[I][                            Init][ 106]: LLM init start
bos_id: 1, eos_id: 49279
  2% | █                                 |   1 /  34 [0.00s<0.14s, 250.00 count/s] tokenizer init ok
[I][                            Init][  26]: LLaMaEmbedSelector use mmap
100% | ████████████████████████████████ |  34 /  34 [0.67s<0.67s, 50.90 count/s] init vpm axmodel ok,remain_cmm(11698 MB)B)
[I][                            Init][ 254]: max_token_len : 1023
[I][                            Init][ 259]: kv_cache_size : 192, kv_cache_num: 1023
[I][                            Init][ 267]: prefill_token_num : 128
[I][                            Init][ 269]: vpm_height : 512,vpm_width : 512
[I][                            Init][ 279]: LLM init ok
Type "q" to exit, Ctrl+c to stop current running
prompt >> Describe the picture
image >> ./ssd_car.jpg
[I][                          Encode][ 338]: image encode time : 104.691002 ms, size : 36864
[I][                             Run][ 549]: ttft: 58.01 ms
 The image depicts a double decker bus, which is prominently displayed in the center of the image. The bus is red and has a large, bold sign on its roof that reads
"Things Get More Exciting When You Say So." The sign is in white text, and the bus is designed to be eye-catching and visually appealing.

The bus is parked on a city street, with a few other vehicles visible in the background. The street is lined with buildings, including a few shops and restaurants,
which are partially visible. The buildings are well-lit, and the street is clean and well-maintained.

In the foreground, there is a person standing in front of the bus. The person is wearing a dark jacket and appears to be waiting for the bus. The person is facing the bus,
and they seem to be waiting for the bus to arrive.

The bus is parked on the street, and it is not moving. The bus is not moving, and there are no other vehicles visible in the image. The street is well-maintained,
and the buildings are well-lit, indicating that it is a sunny day.

The image is taken from a slightly elevated perspective, which gives a clear view of the bus and the surrounding area. The lighting in the image is bright,
and the shadows are well-defined, indicating that the sun is shining brightly.

To summarize, the image depicts:
1. A double-decker bus with a large sign on its roof that reads "Things Get More Exciting When You Say So."
2. The bus is parked on a city street with a few other vehicles visible in the background.
3. The bus is not moving, and there are no other vehicles visible in the image.
4. The street is well-maintained, and the buildings are well-lit, indicating a sunny day.

This description provides a comprehensive overview of the image, allowing a text model to answer any questions related to the image based on the description.

[N][                             Run][ 688]: hit eos,avg 80.54 token/s

prompt >> q
root@ax650:/mnt/qtang/llm-test/smolvlm-256m#
```

#### Inference with M.2 Accelerator card

[What is M.2 Accelerator card?](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html), Show this DEMO based on Raspberry PI 5.

*TODO*