qqc1989 commited on
Commit
e39a938
·
verified ·
1 Parent(s): 3be80c4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +87 -42
README.md CHANGED
@@ -6,28 +6,30 @@ base_model:
6
  tags:
7
  - InternVL2_5
8
  - InternVL2_5-1B
 
9
  - Int8
10
  - VLM
 
11
  ---
12
 
13
- # InternVL2_5-1B-Int8
14
 
15
- This version of InternVL2_5-1B has been converted to run on the Axera NPU using **w8a16** quantization.
16
 
17
  This model has been optimized with the following LoRA:
18
 
19
- Compatible with Pulsar2 version: 3.3
20
 
21
  ## Convert tools links:
22
 
23
  For those who are interested in model conversion, you can try to export axmodel through the original repo :
24
  https://huggingface.co/OpenGVLab/InternVL2_5-1B-MPO
25
 
26
- [Pulsar2 Link, How to Convert LLM from Huggingface to axmodel](https://pulsar2-docs.readthedocs.io/en/latest/appendix/build_llm.html)
27
 
28
- [AXera NPU HOST LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/internvl2)
29
 
30
- [AXera NPU AXCL LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/axcl-llm-internvl)
31
 
32
  ## Support Platform
33
 
@@ -41,10 +43,9 @@ https://huggingface.co/OpenGVLab/InternVL2_5-1B-MPO
41
  |AX650| 350 ms | 420 ms |32 tokens/sec|
42
 
43
  - AX630C
44
- - AX630C DEMO Board
45
- - [M4N-Dock(爱芯派Pro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
46
- - [M.2 Accelerator card](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)
47
- - AX630C
48
 
49
  |Chips|image encoder 364|ttft|w8a16|
50
  |--|--|--|--|
@@ -55,15 +56,23 @@ https://huggingface.co/OpenGVLab/InternVL2_5-1B-MPO
55
  Download all files from this repository to the device
56
 
57
  ```
58
- root@ax630c:InternVL2_5-1B-MPO-AX630C# tree -L 1
59
  .
 
60
  |-- config.json
 
61
  |-- internvl2_5_1b_364_ax630c
 
62
  |-- internvl2_5_tokenizer
63
  |-- internvl2_5_tokenizer_364.py
 
64
  |-- main
 
 
65
  |-- run_internvl2_5_364_ax630c.sh
66
- `-- image1.jpg
 
 
67
  ```
68
 
69
  #### Install transformer
@@ -75,16 +84,16 @@ pip install transformers==4.41.1
75
  #### Start the Tokenizer service
76
 
77
  ```
78
- (vllm) lihongjie@gn4:InternVL2_5-1B-MPO-AX630C$ python internvl2_5_tokenizer_364.py --host localhost
79
- None None 151645 <|im_end|>
80
- [151644, 8948, 198, 56568, 104625, 100633, 104455, 104800, 101101, 32022, 102022, 99602, 100013, 9370, 90286, 21287, 42140, 53772, 35243, 26288, 104949, 3837, 105205, 109641, 67916, 30698, 11, 54851, 46944, 115404, 42192, 99441, 100623, 48692, 100168, 110498, 1773, 151645, 151644, 872, 198, 151665, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151666, 198, 5501, 7512, 279, 2168, 19620, 13, 151645, 151644, 77091, 198]
81
- 223
82
- [151644, 8948, 198, 56568, 104625, 100633, 104455, 104800, 101101, 32022, 102022, 99602, 100013, 9370, 90286, 21287, 42140, 53772, 35243, 26288, 104949, 3837, 105205, 109641, 67916, 30698, 11, 54851, 46944, 115404, 42192, 99441, 100623, 48692, 100168, 110498, 1773, 151645, 151644, 872, 198, 14990, 1879, 151645, 151644, 77091, 198]
83
- 47
84
- http://localhost:8080
85
  ```
86
 
87
- #### Inference with AX630C Host, such as M4N-Dock(爱芯派Pro) or AX630C DEMO Board
88
 
89
  - input text
90
 
@@ -94,35 +103,71 @@ Describe the picture
94
 
95
  - input image
96
 
97
- ![](./ssd_car.jpg)
98
 
99
- Open another terminal and run `./run_internvl2_5_364_ax630c.sh`
100
 
101
  ```
102
- root@ax630c:InternVL2_5-1B-MPO-AX630C# bash run_internvl2_5_364_ax630c.sh
103
- [I][ Init][ 106]: LLM init start
 
104
  bos_id: -1, eos_id: 151645
105
- 3% | ██ | 1 / 28 [0.17s<4.90s, 5.71 count/s] tokenizer init ok[I][ Init][ 26]: LLaMaEmbedSelector use mmap
106
- 100% | ████████████████████████████████ | 28 / 28 [5.41s<5.41s, 5.17 count/s] init vpm axmodel ok,remain_cmm(907 MB)MB)
107
- [I][ Init][ 254]: max_token_len : 1023
108
- [I][ Init][ 259]: kv_cache_size : 128, kv_cache_num: 1023
109
- [I][ Init][ 267]: prefill_token_num : 256
110
- [I][ Init][ 269]: vpm_height : 364,vpm_width : 364
111
- [I][ Init][ 278]: LLM init ok
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
  Type "q" to exit, Ctrl+c to stop current running
113
- prompt >> Please describe the image shortly.
114
  image >> image1.jpg
115
- [I][ Encode][ 337]: image encode time : 1768.706055 ms, size : 151424
116
- [I][ Run][ 548]: ttft: 1123.02 ms
117
- The image shows a red panda resting on a wooden platform. It has a reddish-brown fur coat with white markings around its eyes and ears. The background features green foliage.
 
 
 
 
 
118
 
119
- [N][ Run][ 687]: hit eos,avg 2.42 token/s
 
 
120
 
121
- prompt >>
122
- ```
123
-
124
- #### Inference with M.2 Accelerator card
125
 
126
- [What is M.2 Accelerator card?](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html), Show this DEMO based on Raspberry PI 5.
127
 
128
- *TODO*
 
6
  tags:
7
  - InternVL2_5
8
  - InternVL2_5-1B
9
+ - InternVL2_5-1B-MPO
10
  - Int8
11
  - VLM
12
+ pipeline_tag: image-text-to-text
13
  ---
14
 
15
+ # InternVL2_5-1B-MPO
16
 
17
+ This version of InternVL2_5-1B-MPO has been converted to run on the Axera NPU using **w8a16** quantization.
18
 
19
  This model has been optimized with the following LoRA:
20
 
21
+ Compatible with Pulsar2 version: 4.1
22
 
23
  ## Convert tools links:
24
 
25
  For those who are interested in model conversion, you can try to export axmodel through the original repo :
26
  https://huggingface.co/OpenGVLab/InternVL2_5-1B-MPO
27
 
28
+ [How to Convert LLM from Huggingface to axmodel](https://github.com/AXERA-TECH/InternVL2_5-1B-MPO.axera/tree/master/model_convert)
29
 
30
+ [AXera NPU HOST LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/ax-internvl)
31
 
32
+ [AXera NPU AXCL LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/axcl-internvl)
33
 
34
  ## Support Platform
35
 
 
43
  |AX650| 350 ms | 420 ms |32 tokens/sec|
44
 
45
  - AX630C
46
+ - [爱芯派2](https://axera-pi-2-docs-cn.readthedocs.io/zh-cn/latest/index.html)
47
+ - [Module-LLM](https://docs.m5stack.com/zh_CN/module/Module-LLM)
48
+ - [LLM630 Compute Kit](https://docs.m5stack.com/zh_CN/core/LLM630%20Compute%20Kit)
 
49
 
50
  |Chips|image encoder 364|ttft|w8a16|
51
  |--|--|--|--|
 
56
  Download all files from this repository to the device
57
 
58
  ```
59
+ root@ax650:/mnt/qtang/llm-test/internvl2_5-1b-mpo# tree -L 1
60
  .
61
+ |-- README.md
62
  |-- config.json
63
+ |-- image1.jpg
64
  |-- internvl2_5_1b_364_ax630c
65
+ |-- internvl2_5_1b_448_ax650
66
  |-- internvl2_5_tokenizer
67
  |-- internvl2_5_tokenizer_364.py
68
+ |-- internvl2_5_tokenizer_448.py
69
  |-- main
70
+ |-- main_ax650
71
+ |-- post_config.json
72
  |-- run_internvl2_5_364_ax630c.sh
73
+ `-- run_internvl2_5_448_ax650.sh
74
+
75
+ 3 directories, 10 files
76
  ```
77
 
78
  #### Install transformer
 
84
  #### Start the Tokenizer service
85
 
86
  ```
87
+ root@ax650:/mnt/qtang/llm-test/internvl2_5-1b-mpo# python3 internvl2_5_tokenizer_448.py
88
+ None None 151645 <|im_end|> 151665 151667
89
+ context_len is 256
90
+ prompt is <|im_start|>system
91
+ 你是书生·万象, 英文名是InternVL, 是由上海人工智能实验室、清华大学及多家合作单位联合开发的多模态大语言模型.<|im_end|>
92
+ .......
93
+ http://0.0.0.0:12345
94
  ```
95
 
96
+ #### Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650 DEMO Board
97
 
98
  - input text
99
 
 
103
 
104
  - input image
105
 
106
+ ![](./image1.jpg)
107
 
108
+ Open another terminal and run `./run_internvl2_5_448_ax650.sh`
109
 
110
  ```
111
+ root@ax650:/mnt/qtang/llm-test/internvl2_5-1b-mpo# ./run_internvl2_5_448_ax650.sh
112
+ [I][ Init][ 134]: LLM init start
113
+ [I][ Init][ 34]: connect http://0.0.0.0:12345 ok
114
  bos_id: -1, eos_id: 151645
115
+ img_start_token: 151665
116
+ img_context_token: 151667
117
+ 3% | ██ | 1 / 27 [0.01s<0.30s, 90.91 count/s] tokenizer init ok
118
+ [I][ Init][ 45]: LLaMaEmbedSelector use mmap
119
+ 7% | ███ | 2 / 27 [0.01s<0.19s, 142.86 count/s] embed_selector init ok
120
+ 100% | ████████████████████████████████ | 27 / 27 [4.31s<4.31s, 6.26 count/s] init post axmodel ok,remain_cmm(3881 MB)
121
+ [I][ Init][ 226]: IMAGE_CONTEXT_TOKEN: 151667, IMAGE_START_TOKEN: 151665
122
+ [I][ Init][ 251]: image encoder input nchw@float32
123
+ [I][ Init][ 281]: image encoder output float32
124
+
125
+ [I][ Init][ 291]: image_encoder_height : 448, image_encoder_width: 448
126
+ [I][ Init][ 293]: max_token_len : 2559
127
+ [I][ Init][ 296]: kv_cache_size : 128, kv_cache_num: 2559
128
+ [I][ Init][ 304]: prefill_token_num : 128
129
+ [I][ Init][ 308]: grp: 1, prefill_max_token_num : 1
130
+ [I][ Init][ 308]: grp: 2, prefill_max_token_num : 128
131
+ [I][ Init][ 308]: grp: 3, prefill_max_token_num : 256
132
+ [I][ Init][ 308]: grp: 4, prefill_max_token_num : 384
133
+ [I][ Init][ 308]: grp: 5, prefill_max_token_num : 512
134
+ [I][ Init][ 308]: grp: 6, prefill_max_token_num : 640
135
+ [I][ Init][ 308]: grp: 7, prefill_max_token_num : 768
136
+ [I][ Init][ 308]: grp: 8, prefill_max_token_num : 896
137
+ [I][ Init][ 308]: grp: 9, prefill_max_token_num : 1024
138
+ [I][ Init][ 312]: prefill_max_token_num : 1024
139
+ [I][ load_config][ 282]: load config:
140
+ {
141
+ "enable_repetition_penalty": false,
142
+ "enable_temperature": true,
143
+ "enable_top_k_sampling": true,
144
+ "enable_top_p_sampling": false,
145
+ "penalty_window": 20,
146
+ "repetition_penalty": 1.2,
147
+ "temperature": 0.9,
148
+ "top_k": 10,
149
+ "top_p": 0.8
150
+ }
151
+
152
+ [I][ Init][ 321]: LLM init ok
153
  Type "q" to exit, Ctrl+c to stop current running
154
+ prompt >> Describe the picture
155
  image >> image1.jpg
156
+ [I][ Encode][ 415]: image encode time : 395.42 ms, size : 229376
157
+ [I][ Encode][ 524]: idx:0 offset : 48 out_embed.size() : 277760
158
+ [I][ Run][ 551]: input token num : 310, prefill_split_num : 3
159
+ [I][ Run][ 566]: prefill grpid 4
160
+ [I][ Run][ 593]: input_num_token:128
161
+ [I][ Run][ 593]: input_num_token:128
162
+ [I][ Run][ 593]: input_num_token:54
163
+ [I][ Run][ 717]: ttft: 625.86 ms
164
 
165
+ : The image features a red panda sitting in a tree with a blurred green background indicating foliage.
166
+ The red panda has a distinctive reddish-brown head and back, white underparts, and black patches around its eyes,
167
+ nose, and mouth. It appears to be resting or lounging comfortably on a wooden platform.
168
 
169
+ [N][ Run][ 826]: hit eos,avg 27.37 token/s
 
 
 
170
 
171
+ prompt >> q
172
 
173
+ ```