AXERA-TECH
/

InternVL2_5-1B-MPO

@@ -6,28 +6,30 @@ base_model:
 tags:
 - InternVL2_5
 - InternVL2_5-1B
 - Int8
 - VLM
 ---
-# InternVL2_5-1B-Int8
-This version of InternVL2_5-1B has been converted to run on the Axera NPU using **w8a16** quantization.
 This model has been optimized with the following LoRA:
-Compatible with Pulsar2 version: 3.3
 ## Convert tools links:
 For those who are interested in model conversion, you can try to export axmodel through the original repo :
 https://huggingface.co/OpenGVLab/InternVL2_5-1B-MPO
-[Pulsar2 Link, How to Convert LLM from Huggingface to axmodel](https://pulsar2-docs.readthedocs.io/en/latest/appendix/build_llm.html)
-[AXera NPU HOST LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/internvl2)
-[AXera NPU AXCL LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/axcl-llm-internvl)
 ## Support Platform
@@ -41,10 +43,9 @@ https://huggingface.co/OpenGVLab/InternVL2_5-1B-MPO
 |AX650| 350 ms | 420 ms |32 tokens/sec|
 - AX630C
-  - AX630C DEMO Board
-  - [M4N-Dock(爱芯派Pro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
-  - [M.2 Accelerator card](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)
-- AX630C
 |Chips|image encoder 364|ttft|w8a16|
 |--|--|--|--|
@@ -55,15 +56,23 @@ https://huggingface.co/OpenGVLab/InternVL2_5-1B-MPO
 Download all files from this repository to the device
 ```
-root@ax630c:InternVL2_5-1B-MPO-AX630C# tree -L 1
 .
 |-- config.json
 |-- internvl2_5_1b_364_ax630c
 |-- internvl2_5_tokenizer
 |-- internvl2_5_tokenizer_364.py
 |-- main
 |-- run_internvl2_5_364_ax630c.sh
-`-- image1.jpg
 ```
 #### Install transformer
@@ -75,16 +84,16 @@ pip install transformers==4.41.1
 #### Start the Tokenizer service
 ```
-(vllm) lihongjie@gn4:InternVL2_5-1B-MPO-AX630C$ python internvl2_5_tokenizer_364.py --host localhost
-None None 151645 <|im_end|>
-[151644, 8948, 198, 56568, 104625, 100633, 104455, 104800, 101101, 32022, 102022, 99602, 100013, 9370, 90286, 21287, 42140, 53772, 35243, 26288, 104949, 3837, 105205, 109641, 67916, 30698, 11, 54851, 46944, 115404, 42192, 99441, 100623, 48692, 100168, 110498, 1773, 151645, 151644, 872, 198, 151665, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151666, 198, 5501, 7512, 279, 2168, 19620, 13, 151645, 151644, 77091, 198]
-223
-[151644, 8948, 198, 56568, 104625, 100633, 104455, 104800, 101101, 32022, 102022, 99602, 100013, 9370, 90286, 21287, 42140, 53772, 35243, 26288, 104949, 3837, 105205, 109641, 67916, 30698, 11, 54851, 46944, 115404, 42192, 99441, 100623, 48692, 100168, 110498, 1773, 151645, 151644, 872, 198, 14990, 1879, 151645, 151644, 77091, 198]
-47
-http://localhost:8080
 ```
-#### Inference with AX630C Host, such as M4N-Dock(爱芯派Pro) or AX630C DEMO Board
 - input text
@@ -94,35 +103,71 @@ Describe the picture
 - input image
-![](./ssd_car.jpg)
-Open another terminal and run `./run_internvl2_5_364_ax630c.sh`
 ```
-root@ax630c:InternVL2_5-1B-MPO-AX630C# bash run_internvl2_5_364_ax630c.sh
-[I][                            Init][ 106]: LLM init start
 bos_id: -1, eos_id: 151645
-  3% | ██                                |   1 /  28 [0.17s<4.90s, 5.71 count/s] tokenizer init ok[I][                            Init][  26]: LLaMaEmbedSelector use mmap
-100% | ████████████████████████████████ |  28 /  28 [5.41s<5.41s, 5.17 count/s] init vpm axmodel ok,remain_cmm(907 MB)MB)
-[I][                            Init][ 254]: max_token_len : 1023
-[I][                            Init][ 259]: kv_cache_size : 128, kv_cache_num: 1023
-[I][                            Init][ 267]: prefill_token_num : 256
-[I][                            Init][ 269]: vpm_height : 364,vpm_width : 364
-[I][                            Init][ 278]: LLM init ok
 Type "q" to exit, Ctrl+c to stop current running
-prompt >> Please describe the image shortly.
 image >> image1.jpg
-[I][                          Encode][ 337]: image encode time : 1768.706055 ms, size : 151424
-[I][                             Run][ 548]: ttft: 1123.02 ms
-The image shows a red panda resting on a wooden platform. It has a reddish-brown fur coat with white markings around its eyes and ears. The background features green foliage.
-[N][                             Run][ 687]: hit eos,avg 2.42 token/s
-prompt >>
-```
-#### Inference with M.2 Accelerator card
-[What is M.2 Accelerator card?](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html), Show this DEMO based on Raspberry PI 5.
-*TODO*

 tags:
 - InternVL2_5
 - InternVL2_5-1B
+- InternVL2_5-1B-MPO
 - Int8
 - VLM
+pipeline_tag: image-text-to-text
 ---
+# InternVL2_5-1B-MPO
+This version of InternVL2_5-1B-MPO has been converted to run on the Axera NPU using **w8a16** quantization.
 This model has been optimized with the following LoRA:
+Compatible with Pulsar2 version: 4.1
 ## Convert tools links:
 For those who are interested in model conversion, you can try to export axmodel through the original repo :
 https://huggingface.co/OpenGVLab/InternVL2_5-1B-MPO
+[How to Convert LLM from Huggingface to axmodel](https://github.com/AXERA-TECH/InternVL2_5-1B-MPO.axera/tree/master/model_convert)
+[AXera NPU HOST LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/ax-internvl)
+[AXera NPU AXCL LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/axcl-internvl)
 ## Support Platform
 |AX650| 350 ms | 420 ms |32 tokens/sec|
 - AX630C
+  - [爱芯派2](https://axera-pi-2-docs-cn.readthedocs.io/zh-cn/latest/index.html)
+  - [Module-LLM](https://docs.m5stack.com/zh_CN/module/Module-LLM)
+  - [LLM630 Compute Kit](https://docs.m5stack.com/zh_CN/core/LLM630%20Compute%20Kit)
 |Chips|image encoder 364|ttft|w8a16|
 |--|--|--|--|
 Download all files from this repository to the device
 ```
+root@ax650:/mnt/qtang/llm-test/internvl2_5-1b-mpo# tree -L 1
 .
+|-- README.md
 |-- config.json
+|-- image1.jpg
 |-- internvl2_5_1b_364_ax630c
+|-- internvl2_5_1b_448_ax650
 |-- internvl2_5_tokenizer
 |-- internvl2_5_tokenizer_364.py
+|-- internvl2_5_tokenizer_448.py
 |-- main
+|-- main_ax650
+|-- post_config.json
 |-- run_internvl2_5_364_ax630c.sh
+`-- run_internvl2_5_448_ax650.sh
+3 directories, 10 files
 ```
 #### Install transformer
 #### Start the Tokenizer service
 ```
+root@ax650:/mnt/qtang/llm-test/internvl2_5-1b-mpo# python3 internvl2_5_tokenizer_448.py
+None None 151645 <|im_end|> 151665 151667
+context_len is  256
+prompt is <|im_start|>system
+你是书生·万象, 英文名是InternVL, 是由上海人工智能实验室、清华大学及多家合作单位联合开发的多模态大语言模型.<|im_end|>
+.......
+http://0.0.0.0:12345
 ```
+#### Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650 DEMO Board
 - input text
 - input image
+![](./image1.jpg)
+Open another terminal and run `./run_internvl2_5_448_ax650.sh`
 ```
+root@ax650:/mnt/qtang/llm-test/internvl2_5-1b-mpo# ./run_internvl2_5_448_ax650.sh
+[I][                            Init][ 134]: LLM init start
+[I][                            Init][  34]: connect http://0.0.0.0:12345 ok
 bos_id: -1, eos_id: 151645
+img_start_token: 151665
+img_context_token: 151667
+  3% | ██                                |   1 /  27 [0.01s<0.30s, 90.91 count/s] tokenizer init ok
+[I][                            Init][  45]: LLaMaEmbedSelector use mmap
+  7% | ███                               |   2 /  27 [0.01s<0.19s, 142.86 count/s] embed_selector init ok
+100% | ████████████████████████████████ |  27 /  27 [4.31s<4.31s, 6.26 count/s] init post axmodel ok,remain_cmm(3881 MB)
+[I][                            Init][ 226]: IMAGE_CONTEXT_TOKEN: 151667, IMAGE_START_TOKEN: 151665
+[I][                            Init][ 251]: image encoder input nchw@float32
+[I][                            Init][ 281]: image encoder output float32
+[I][                            Init][ 291]: image_encoder_height : 448, image_encoder_width: 448
+[I][                            Init][ 293]: max_token_len : 2559
+[I][                            Init][ 296]: kv_cache_size : 128, kv_cache_num: 2559
+[I][                            Init][ 304]: prefill_token_num : 128
+[I][                            Init][ 308]: grp: 1, prefill_max_token_num : 1
+[I][                            Init][ 308]: grp: 2, prefill_max_token_num : 128
+[I][                            Init][ 308]: grp: 3, prefill_max_token_num : 256
+[I][                            Init][ 308]: grp: 4, prefill_max_token_num : 384
+[I][                            Init][ 308]: grp: 5, prefill_max_token_num : 512
+[I][                            Init][ 308]: grp: 6, prefill_max_token_num : 640
+[I][                            Init][ 308]: grp: 7, prefill_max_token_num : 768
+[I][                            Init][ 308]: grp: 8, prefill_max_token_num : 896
+[I][                            Init][ 308]: grp: 9, prefill_max_token_num : 1024
+[I][                            Init][ 312]: prefill_max_token_num : 1024
+[I][                     load_config][ 282]: load config:
+{
+    "enable_repetition_penalty": false,
+    "enable_temperature": true,
+    "enable_top_k_sampling": true,
+    "enable_top_p_sampling": false,
+    "penalty_window": 20,
+    "repetition_penalty": 1.2,
+    "temperature": 0.9,
+    "top_k": 10,
+    "top_p": 0.8
+}
+[I][                            Init][ 321]: LLM init ok
 Type "q" to exit, Ctrl+c to stop current running
+prompt >> Describe the picture
 image >> image1.jpg
+[I][                          Encode][ 415]: image encode time : 395.42 ms, size : 229376
+[I][                          Encode][ 524]: idx:0 offset : 48 out_embed.size() : 277760
+[I][                             Run][ 551]: input token num : 310, prefill_split_num : 3
+[I][                             Run][ 566]: prefill grpid 4
+[I][                             Run][ 593]: input_num_token:128
+[I][                             Run][ 593]: input_num_token:128
+[I][                             Run][ 593]: input_num_token:54
+[I][                             Run][ 717]: ttft: 625.86 ms
+: The image features a red panda sitting in a tree with a blurred green background indicating foliage.
+The red panda has a distinctive reddish-brown head and back, white underparts, and black patches around its eyes,
+nose, and mouth. It appears to be resting or lounging comfortably on a wooden platform.
+[N][                             Run][ 826]: hit eos,avg 27.37 token/s
+prompt >> q
+```