Update README.md
Browse files
README.md
CHANGED
@@ -6,28 +6,30 @@ base_model:
|
|
6 |
tags:
|
7 |
- InternVL2_5
|
8 |
- InternVL2_5-1B
|
|
|
9 |
- Int8
|
10 |
- VLM
|
|
|
11 |
---
|
12 |
|
13 |
-
# InternVL2_5-1B-
|
14 |
|
15 |
-
This version of InternVL2_5-1B has been converted to run on the Axera NPU using **w8a16** quantization.
|
16 |
|
17 |
This model has been optimized with the following LoRA:
|
18 |
|
19 |
-
Compatible with Pulsar2 version:
|
20 |
|
21 |
## Convert tools links:
|
22 |
|
23 |
For those who are interested in model conversion, you can try to export axmodel through the original repo :
|
24 |
https://huggingface.co/OpenGVLab/InternVL2_5-1B-MPO
|
25 |
|
26 |
-
[
|
27 |
|
28 |
-
[AXera NPU HOST LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/
|
29 |
|
30 |
-
[AXera NPU AXCL LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/axcl-
|
31 |
|
32 |
## Support Platform
|
33 |
|
@@ -41,10 +43,9 @@ https://huggingface.co/OpenGVLab/InternVL2_5-1B-MPO
|
|
41 |
|AX650| 350 ms | 420 ms |32 tokens/sec|
|
42 |
|
43 |
- AX630C
|
44 |
-
-
|
45 |
-
- [
|
46 |
-
- [
|
47 |
-
- AX630C
|
48 |
|
49 |
|Chips|image encoder 364|ttft|w8a16|
|
50 |
|--|--|--|--|
|
@@ -55,15 +56,23 @@ https://huggingface.co/OpenGVLab/InternVL2_5-1B-MPO
|
|
55 |
Download all files from this repository to the device
|
56 |
|
57 |
```
|
58 |
-
root@
|
59 |
.
|
|
|
60 |
|-- config.json
|
|
|
61 |
|-- internvl2_5_1b_364_ax630c
|
|
|
62 |
|-- internvl2_5_tokenizer
|
63 |
|-- internvl2_5_tokenizer_364.py
|
|
|
64 |
|-- main
|
|
|
|
|
65 |
|-- run_internvl2_5_364_ax630c.sh
|
66 |
-
`--
|
|
|
|
|
67 |
```
|
68 |
|
69 |
#### Install transformer
|
@@ -75,16 +84,16 @@ pip install transformers==4.41.1
|
|
75 |
#### Start the Tokenizer service
|
76 |
|
77 |
```
|
78 |
-
|
79 |
-
None None 151645 <|im_end|>
|
80 |
-
|
81 |
-
|
82 |
-
|
83 |
-
|
84 |
-
http://
|
85 |
```
|
86 |
|
87 |
-
#### Inference with
|
88 |
|
89 |
- input text
|
90 |
|
@@ -94,35 +103,71 @@ Describe the picture
|
|
94 |
|
95 |
- input image
|
96 |
|
97 |
-

|
29 |
|
30 |
+
[AXera NPU HOST LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/ax-internvl)
|
31 |
|
32 |
+
[AXera NPU AXCL LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/axcl-internvl)
|
33 |
|
34 |
## Support Platform
|
35 |
|
|
|
43 |
|AX650| 350 ms | 420 ms |32 tokens/sec|
|
44 |
|
45 |
- AX630C
|
46 |
+
- [爱芯派2](https://axera-pi-2-docs-cn.readthedocs.io/zh-cn/latest/index.html)
|
47 |
+
- [Module-LLM](https://docs.m5stack.com/zh_CN/module/Module-LLM)
|
48 |
+
- [LLM630 Compute Kit](https://docs.m5stack.com/zh_CN/core/LLM630%20Compute%20Kit)
|
|
|
49 |
|
50 |
|Chips|image encoder 364|ttft|w8a16|
|
51 |
|--|--|--|--|
|
|
|
56 |
Download all files from this repository to the device
|
57 |
|
58 |
```
|
59 |
+
root@ax650:/mnt/qtang/llm-test/internvl2_5-1b-mpo# tree -L 1
|
60 |
.
|
61 |
+
|-- README.md
|
62 |
|-- config.json
|
63 |
+
|-- image1.jpg
|
64 |
|-- internvl2_5_1b_364_ax630c
|
65 |
+
|-- internvl2_5_1b_448_ax650
|
66 |
|-- internvl2_5_tokenizer
|
67 |
|-- internvl2_5_tokenizer_364.py
|
68 |
+
|-- internvl2_5_tokenizer_448.py
|
69 |
|-- main
|
70 |
+
|-- main_ax650
|
71 |
+
|-- post_config.json
|
72 |
|-- run_internvl2_5_364_ax630c.sh
|
73 |
+
`-- run_internvl2_5_448_ax650.sh
|
74 |
+
|
75 |
+
3 directories, 10 files
|
76 |
```
|
77 |
|
78 |
#### Install transformer
|
|
|
84 |
#### Start the Tokenizer service
|
85 |
|
86 |
```
|
87 |
+
root@ax650:/mnt/qtang/llm-test/internvl2_5-1b-mpo# python3 internvl2_5_tokenizer_448.py
|
88 |
+
None None 151645 <|im_end|> 151665 151667
|
89 |
+
context_len is 256
|
90 |
+
prompt is <|im_start|>system
|
91 |
+
你是书生·万象, 英文名是InternVL, 是由上海人工智能实验室、清华大学及多家合作单位联合开发的多模态大语言模型.<|im_end|>
|
92 |
+
.......
|
93 |
+
http://0.0.0.0:12345
|
94 |
```
|
95 |
|
96 |
+
#### Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650 DEMO Board
|
97 |
|
98 |
- input text
|
99 |
|
|
|
103 |
|
104 |
- input image
|
105 |
|
106 |
+

|
107 |
|
108 |
+
Open another terminal and run `./run_internvl2_5_448_ax650.sh`
|
109 |
|
110 |
```
|
111 |
+
root@ax650:/mnt/qtang/llm-test/internvl2_5-1b-mpo# ./run_internvl2_5_448_ax650.sh
|
112 |
+
[I][ Init][ 134]: LLM init start
|
113 |
+
[I][ Init][ 34]: connect http://0.0.0.0:12345 ok
|
114 |
bos_id: -1, eos_id: 151645
|
115 |
+
img_start_token: 151665
|
116 |
+
img_context_token: 151667
|
117 |
+
3% | ██ | 1 / 27 [0.01s<0.30s, 90.91 count/s] tokenizer init ok
|
118 |
+
[I][ Init][ 45]: LLaMaEmbedSelector use mmap
|
119 |
+
7% | ███ | 2 / 27 [0.01s<0.19s, 142.86 count/s] embed_selector init ok
|
120 |
+
100% | ████████████████████████████████ | 27 / 27 [4.31s<4.31s, 6.26 count/s] init post axmodel ok,remain_cmm(3881 MB)
|
121 |
+
[I][ Init][ 226]: IMAGE_CONTEXT_TOKEN: 151667, IMAGE_START_TOKEN: 151665
|
122 |
+
[I][ Init][ 251]: image encoder input nchw@float32
|
123 |
+
[I][ Init][ 281]: image encoder output float32
|
124 |
+
|
125 |
+
[I][ Init][ 291]: image_encoder_height : 448, image_encoder_width: 448
|
126 |
+
[I][ Init][ 293]: max_token_len : 2559
|
127 |
+
[I][ Init][ 296]: kv_cache_size : 128, kv_cache_num: 2559
|
128 |
+
[I][ Init][ 304]: prefill_token_num : 128
|
129 |
+
[I][ Init][ 308]: grp: 1, prefill_max_token_num : 1
|
130 |
+
[I][ Init][ 308]: grp: 2, prefill_max_token_num : 128
|
131 |
+
[I][ Init][ 308]: grp: 3, prefill_max_token_num : 256
|
132 |
+
[I][ Init][ 308]: grp: 4, prefill_max_token_num : 384
|
133 |
+
[I][ Init][ 308]: grp: 5, prefill_max_token_num : 512
|
134 |
+
[I][ Init][ 308]: grp: 6, prefill_max_token_num : 640
|
135 |
+
[I][ Init][ 308]: grp: 7, prefill_max_token_num : 768
|
136 |
+
[I][ Init][ 308]: grp: 8, prefill_max_token_num : 896
|
137 |
+
[I][ Init][ 308]: grp: 9, prefill_max_token_num : 1024
|
138 |
+
[I][ Init][ 312]: prefill_max_token_num : 1024
|
139 |
+
[I][ load_config][ 282]: load config:
|
140 |
+
{
|
141 |
+
"enable_repetition_penalty": false,
|
142 |
+
"enable_temperature": true,
|
143 |
+
"enable_top_k_sampling": true,
|
144 |
+
"enable_top_p_sampling": false,
|
145 |
+
"penalty_window": 20,
|
146 |
+
"repetition_penalty": 1.2,
|
147 |
+
"temperature": 0.9,
|
148 |
+
"top_k": 10,
|
149 |
+
"top_p": 0.8
|
150 |
+
}
|
151 |
+
|
152 |
+
[I][ Init][ 321]: LLM init ok
|
153 |
Type "q" to exit, Ctrl+c to stop current running
|
154 |
+
prompt >> Describe the picture
|
155 |
image >> image1.jpg
|
156 |
+
[I][ Encode][ 415]: image encode time : 395.42 ms, size : 229376
|
157 |
+
[I][ Encode][ 524]: idx:0 offset : 48 out_embed.size() : 277760
|
158 |
+
[I][ Run][ 551]: input token num : 310, prefill_split_num : 3
|
159 |
+
[I][ Run][ 566]: prefill grpid 4
|
160 |
+
[I][ Run][ 593]: input_num_token:128
|
161 |
+
[I][ Run][ 593]: input_num_token:128
|
162 |
+
[I][ Run][ 593]: input_num_token:54
|
163 |
+
[I][ Run][ 717]: ttft: 625.86 ms
|
164 |
|
165 |
+
: The image features a red panda sitting in a tree with a blurred green background indicating foliage.
|
166 |
+
The red panda has a distinctive reddish-brown head and back, white underparts, and black patches around its eyes,
|
167 |
+
nose, and mouth. It appears to be resting or lounging comfortably on a wooden platform.
|
168 |
|
169 |
+
[N][ Run][ 826]: hit eos,avg 27.37 token/s
|
|
|
|
|
|
|
170 |
|
171 |
+
prompt >> q
|
172 |
|
173 |
+
```
|