Qwen2.5-VL-3B-Instruct
This version of Qwen2.5-VL-3B-Instruct has been converted to run on the Axera NPU using w8a16 quantization.
This model has been optimized with the following LoRA:
Compatible with Pulsar2 version: 3.4
Convert tools links:
For those who are interested in model conversion, you can try to export axmodel through the original repo : https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct
Pulsar2 Link, How to Convert LLM from Huggingface to axmodel
Support Platform
- AX650
- AX650N DEMO Board
- M4N-Dock(็ฑ่ฏๆดพPro)
- M.2 Accelerator card
Image Process
Chips | input size | image num | image encoder | ttft(320 tokens) | w8a16 | DDR | Flash |
---|---|---|---|---|---|---|---|
AX650 | 448*448 | 1 | 780 ms | 2857 ms | 6.2 tokens/sec | 4.3 GiB | 4.6 GiB |
Video Process
Chips | input size | image num | image encoder | ttft(512 tokens) | w8a16 | DDR | Flash |
---|---|---|---|---|---|---|---|
AX650 | 308*308 | 8 | 1400 ms | 5400 ms | 6.1 tokens/sec | 4.4 GiB | 4.7 GiB |
The DDR capacity refers to the CMM memory that needs to be consumed. Ensure that the CMM memory allocation on the development board is greater than this value.
How to use
Download all files from this repository to the device
If you using AX650 Board
root@ax650:/mnt/qtang/llm-test/qwen2.5-vl-3b# tree -L 2
.
โโโ image
โ โโโ ssd_car.jpg
โโโ main
โโโ python
โ โโโ cv_resize.py
โ โโโ infer_image.py
โ โโโ infer_text.py
โ โโโ infer_video.py
โ โโโ preprocess.py
โ โโโ utils.py
โโโ qwen2_5-vl-3b-image-ax650
โ โโโ Qwen2.5-VL-3B-Instruct_vision_nchw448.axmodel
โ โโโ model.embed_tokens.weight.bfloat16.bin
โ โโโ qwen2_5_vl_p320_l0_together.axmodel
......
โ โโโ qwen2_5_vl_p320_l9_together.axmodel
โ โโโ qwen2_5_vl_post.axmodel
โโโ qwen2_5-vl-3b-video-ax650
โ โโโ Qwen2.5-VL-3B-Instruct_vision_nhwc.axmodel
โ โโโ model.embed_tokens.weight.bfloat16.bin
โ โโโ qwen2_5_vl_p512_l0_together.axmodel
......
โ โโโ qwen2_5_vl_p512_l9_together.axmodel
โ โโโ qwen2_5_vl_post.axmodel
โโโ qwen2_5-vl-tokenizer
โ โโโ chat_template.json
โ โโโ config.json
โ โโโ generation_config.json
โ โโโ merges.txt
โ โโโ model.safetensors.index.json
โ โโโ preprocessor_config.json
โ โโโ tokenizer.json
โ โโโ tokenizer_config.json
โ โโโ vocab.json
โโโ qwen2_tokenizer_image_448.py
โโโ qwen2_tokenizer_video_308.py
โโโ run_qwen2_5_vl_image.sh
โโโ run_qwen2_5_vl_video.sh
โโโ video
โโโ frame_0075.jpg
......
โโโ frame_0089.jpg
Prepare tokenizer server
Install transformer
pip install transformers==4.41.1
Demo Run
Image understand demo
start tokenizer server for image understand demo
python3 qwen2_tokenizer_image_448.py --port 12345
run image understand demo
- input text
ๆ่ฟฐไธๅพ็
- input image
root@ax650:/mnt/qtang/llm-test/qwen2.5-vl-3b# ./run_qwen2_5_vl_image.sh
[I][ Init][ 129]: LLM init start
bos_id: -1, eos_id: 151645
2% | โ | 1 / 40 [0.01s<0.24s, 166.67 count/s] tokenizer init ok
[I][ Init][ 26]: LLaMaEmbedSelector use mmap
100% | โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | 40 / 40 [38.23s<38.23s, 1.05 count/s] init vpm axmodel ok,remain_cmm(7600 MB)
[I][ Init][ 277]: max_token_len : 1023
[I][ Init][ 282]: kv_cache_size : 256, kv_cache_num: 1023
[I][ Init][ 290]: prefill_token_num : 320
[I][ Init][ 292]: vpm_height : 1024,vpm_width : 392
[I][ Init][ 301]: LLM init ok
Type "q" to exit, Ctrl+c to stop current running
prompt >> who are you?
image >>
[I][ Run][ 638]: ttft: 2854.47 ms
I am a large language model created by Alibaba Cloud. I am called Qwen.
[N][ Run][ 779]: hit eos,avg 6.05 token/s
prompt >> ๆ่ฟฐไธๅพ็
image >> image/ssd_car.jpg
[I][ Encode][ 416]: image encode time : 795.614014 ms, size : 524288
[I][ Run][ 638]: ttft: 2856.88 ms
่ฟๅผ ๅพ็ๅฑ็คบไบไธๆก็นๅฟ็ๅๅธ่ก้ใๅๆฏไธญ๏ผไธๅๅฅณๅญ็ซๅจไบบ่ก้ไธ๏ผๅฅน็ฉฟ็้ป่ฒๅคๅฅ๏ผ้ขๅธฆๅพฎ็ฌใๅฅนๆ่พนๆฏไธ่พ็บข่ฒ็ๅๅฑๅทดๅฃซ๏ผๅทดๅฃซไธๆไธไธชๅนฟๅ๏ผ
ไธ้ขๅ็โTHINGS GET MORE EXITING WHEN YOU SAY โYESโโใๅทดๅฃซ็่ฝฆ็ๅทๆฏโL15โใๅทดๅฃซๆ่พนๅ็ไธ่พ้ป่ฒ็ๅฐๅ่ดง่ฝฆใ่ๆฏไธญๅฏไปฅ็ๅฐไธไบๅๅบๅ่กไบบ๏ผ
่ก้ไธคๆ็ๅปบ็ญ็ฉๆฏ็ฐไปฃ็็ป็ๅนๅขๅปบ็ญใๆดไฝๆฐๅดๆพๅพ็นๅฟ่ๅ
ๆปกๆดปๅใ
[N][ Run][ 779]: hit eos,avg 5.96 token/s
Video understand demo
Please pre-process the image of the video file into a 308x308 size picture
start tokenizer server for image understand demo
python qwen2_tokenizer_video_308.py --port 12345
run image understand demo
root@ax650:/mnt/qtang/llm-test/qwen2.5-vl-3b# ./run_qwen2_5_vl_video.sh
[I][ Init][ 129]: LLM init start
bos_id: -1, eos_id: 151645
2% | โ | 1 / 40 [0.00s<0.12s, 333.33 count/s] tokenizer init ok
[I][ Init][ 26]: LLaMaEmbedSelector use mmap
100% | โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | 40 / 40 [40.05s<40.05s, 1.00 count/s] init vpm axmodel ok,remain_cmm(7680 MB)
[I][ Init][ 277]: max_token_len : 1023
[I][ Init][ 282]: kv_cache_size : 256, kv_cache_num: 1023
[I][ Init][ 290]: prefill_token_num : 512
[I][ Init][ 292]: vpm_height : 484,vpm_width : 392
[I][ Init][ 301]: LLM init ok
Type "q" to exit, Ctrl+c to stop current running
prompt >> ๆ่ฟฐไธ่ง้ข
image >> video
video/frame_0000.jpg
video/frame_0008.jpg
video/frame_0016.jpg
video/frame_0024.jpg
video/frame_0032.jpg
video/frame_0040.jpg
video/frame_0048.jpg
video/frame_0056.jpg
[I][ Encode][ 416]: image encode time : 1487.557007 ms, size : 991232
[I][ Run][ 638]: ttft: 5488.29 ms
่ง้ขๅฑ็คบไบไธคๅชๆพ้ผ ๅจๆทๅค็ๅบๆฏใ่ๆฏๆฏๆจก็ณ็ๅฑฑ่ๅ่ๅคฉ๏ผๅๆฏไธญๆๆพ้ผ ๅจไบๅจใๆพ้ผ ็ๆฏ่ฒไธป่ฆๆฏๆฃ่ฒๅ็ฝ่ฒ๏ผๅฎไปฌ็็ชๅญๆฏๆฉ่ฒ็ใๆพ้ผ ไผผไนๅจไบ็ธ็ฉ่ๆไบๆข๏ผๅฎไปฌ็็ชๅญๅๅดๅทด้ฝไผธๅๅฏนๆนใๆดไธชๅบๆฏๆพๅพ้ๅธธ่ช็ถๅ็ๅจใ
Inference with M.2 Accelerator card
What is M.2 Accelerator card?, Show this DEMO based on Raspberry PI 5.
TODO
- Downloads last month
- 9
Model tree for AXERA-TECH/Qwen2.5-VL-3B-Instruct
Base model
Qwen/Qwen2.5-VL-3B-Instruct