File size: 6,401 Bytes
bb7a256 2322a62 a41ab40 2322a62 a41ab40 2322a62 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
---
license: bsd-3-clause
language:
- en
base_model:
- HuggingFaceTB/SmolVLM-256M-Instruct
tags:
- SmolVLM
- Int8
- VLM
---
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/SmolVLM_256_banner.png" width="800" height="auto" alt="Image description">
# SmolVLM-256M-Instruct-Int8
This version of SmolVLM-256M-Instruct has been converted to run on the Axera NPU using **w8a16** quantization.
This model has been optimized with the following LoRA:
Compatible with Pulsar2 version: 3.4-temp
## Convert tools links:
For those who are interested in model conversion, you can try to export axmodel through the original repo :
https://huggingface.co/HuggingFaceTB/SmolVLM-256M-Instruct
[How to Convert LLM from Huggingface to axmodel](https://github.com/AXERA-TECH/SmolVLM-256M-Instruct.axera)
[AXera NPU HOST LLM Runtime](https://github.com/techshoww/ax-llm)
## Support Platform
- AX650
- AX650N DEMO Board
- [M4N-Dock(η±θ―ζ΄ΎPro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
- [M.2 Accelerator card](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)
- AX630C
- [η±θ―ζ΄Ύ2](https://axera-pi-2-docs-cn.readthedocs.io/zh-cn/latest/index.html)
- [Module-LLM](https://docs.m5stack.com/zh_CN/module/Module-LLM)
- [LLM630 Compute Kit](https://docs.m5stack.com/zh_CN/core/LLM630%20Compute%20Kit)
|Chips|image encoder 512|ttft|w8a16|
|--|--|--|--|
|AX650| 105 ms | 57 ms |80 tokens/sec|
|AX630C| 800 ms | 182 ms |31 tokens/sec|
## How to use
Download all files from this repository to the device
```
root@ax650:/mnt/qtang/llm-test/smolvlm-256m # tree -L 1
.
βββ main
βββ post_config.json
βββ run_smolvlm_ax630c.sh
βββ run_smolvlm_ax650.sh
βββ smolvlm-256m-ax630c
βββ smolvlm-256m-ax650
βββ smolvlm_tokenizer
βββ smolvlm_tokenizer_512.py
βββ ssd_car.jpg
```
#### Install transformer
```
pip install transformers==4.41.1
```
#### Start the Tokenizer service
```
root@ax650:/mnt/qtang/llm-test/smolvlm-256m# python smolvlm_tokenizer_512.py --port 12345
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
1 <|im_start|> 49279 <end_of_utterance>
[1, 11126, 42, 49189, 49152, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190,
49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190,
49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190,
49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190,
49190, 49190, 49190, 49190, 49190, 49190, 49190, 49190, 49189, 7306, 346, 5125, 451, 2443, 47, 49279,
198, 9519, 9531, 42]
81
[1, 11126, 42, 28120, 905, 49279, 198, 9519, 9531, 42]
10
http://localhost:12345
```
#### Inference with AX650 Host, such as M4N-Dock(η±θ―ζ΄ΎPro) or AX650N DEMO Board
- input text
```
Describe the picture
```
- input image

Open another terminal and run `./run_smolvlm_ax650.sh`
```
root@ax650:/mnt/qtang/llm-test/smolvlm-256m# ./run_smolvlm_ax650.sh
[I][ Init][ 106]: LLM init start
bos_id: 1, eos_id: 49279
2% | β | 1 / 34 [0.00s<0.14s, 250.00 count/s] tokenizer init ok
[I][ Init][ 26]: LLaMaEmbedSelector use mmap
100% | ββββββββββββββββββββββββββββββββ | 34 / 34 [0.67s<0.67s, 50.90 count/s] init vpm axmodel ok,remain_cmm(11698 MB)B)
[I][ Init][ 254]: max_token_len : 1023
[I][ Init][ 259]: kv_cache_size : 192, kv_cache_num: 1023
[I][ Init][ 267]: prefill_token_num : 128
[I][ Init][ 269]: vpm_height : 512,vpm_width : 512
[I][ Init][ 279]: LLM init ok
Type "q" to exit, Ctrl+c to stop current running
prompt >> Describe the picture
image >> ./ssd_car.jpg
[I][ Encode][ 338]: image encode time : 104.691002 ms, size : 36864
[I][ Run][ 549]: ttft: 58.01 ms
The image depicts a double decker bus, which is prominently displayed in the center of the image. The bus is red and has a large, bold sign on its roof that reads
"Things Get More Exciting When You Say So." The sign is in white text, and the bus is designed to be eye-catching and visually appealing.
The bus is parked on a city street, with a few other vehicles visible in the background. The street is lined with buildings, including a few shops and restaurants,
which are partially visible. The buildings are well-lit, and the street is clean and well-maintained.
In the foreground, there is a person standing in front of the bus. The person is wearing a dark jacket and appears to be waiting for the bus. The person is facing the bus,
and they seem to be waiting for the bus to arrive.
The bus is parked on the street, and it is not moving. The bus is not moving, and there are no other vehicles visible in the image. The street is well-maintained,
and the buildings are well-lit, indicating that it is a sunny day.
The image is taken from a slightly elevated perspective, which gives a clear view of the bus and the surrounding area. The lighting in the image is bright,
and the shadows are well-defined, indicating that the sun is shining brightly.
To summarize, the image depicts:
1. A double-decker bus with a large sign on its roof that reads "Things Get More Exciting When You Say So."
2. The bus is parked on a city street with a few other vehicles visible in the background.
3. The bus is not moving, and there are no other vehicles visible in the image.
4. The street is well-maintained, and the buildings are well-lit, indicating a sunny day.
This description provides a comprehensive overview of the image, allowing a text model to answer any questions related to the image based on the description.
[N][ Run][ 688]: hit eos,avg 80.54 token/s
prompt >> q
root@ax650:/mnt/qtang/llm-test/smolvlm-256m#
```
#### Inference with M.2 Accelerator card
[What is M.2 Accelerator card?](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html), Show this DEMO based on Raspberry PI 5.
*TODO* |