qqc1989 commited on
Commit
1f3024c
·
verified ·
1 Parent(s): 0e8466f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +170 -3
README.md CHANGED
@@ -1,3 +1,170 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: bsd-3-clause
4
+ base_model:
5
+ - OpenGVLab/InternVL3-1B
6
+ tags:
7
+ - InternVL3
8
+ - InternVL3-1B
9
+ - Int8
10
+ - VLM
11
+ pipeline_tag: image-text-to-text
12
+ language:
13
+ - en
14
+ ---
15
+
16
+ # InternVL3-1B
17
+
18
+ This version of InternVL3-1B has been converted to run on the Axera NPU using **w8a16** quantization.
19
+
20
+ This model has been optimized with the following LoRA:
21
+
22
+ Compatible with Pulsar2 version: 4.1
23
+
24
+ ## Convert tools links:
25
+
26
+ For those who are interested in model conversion, you can try to export axmodel through the original repo :
27
+ https://huggingface.co/OpenGVLab/InternVL3-1B
28
+
29
+ [How to Convert LLM from Huggingface to axmodel](https://github.com/AXERA-TECH/InternVL3-2B.axera/tree/master/model_convert)
30
+
31
+ [AXera NPU HOST LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/ax-internvl)
32
+
33
+ [AXera NPU AXCL LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/axcl-internvl)
34
+
35
+ ## Support Platform
36
+
37
+ - AX650
38
+ - AX650N DEMO Board
39
+ - [M4N-Dock(爱芯派Pro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
40
+ - [M.2 Accelerator card](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)
41
+
42
+ |Chips|image encoder 448|ttft|w8a16|
43
+ |--|--|--|--|
44
+ |AX650| 380 ms | 623 ms |30 tokens/sec|
45
+
46
+
47
+ ## How to use
48
+
49
+ Download all files from this repository to the device
50
+
51
+ ```
52
+ root@ax650:/mnt/qtang/llm-test/internvl3-1b# tree -L 1
53
+ .
54
+ |-- gradio_demo.py
55
+ |-- internvl3_1b_ax650
56
+ |-- internvl3_tokenizer
57
+ |-- internvl3_tokenizer.py
58
+ |-- main_api_ax650
59
+ |-- main_api_axcl_x86
60
+ |-- main_ax650
61
+ |-- main_axcl_x86
62
+ |-- post_config.json
63
+ |-- run_internvl_3_1b_448_api_ax650.sh
64
+ |-- run_internvl_3_1b_448_api_axcl_x86.sh
65
+ |-- run_internvl_3_1b_448_ax650.sh
66
+ |-- run_internvl_3_1b_448_axcl_x86.sh
67
+ `-- ssd_car.jpg
68
+ ```
69
+
70
+ #### Install transformer
71
+
72
+ ```
73
+ pip install transformers==4.41.1
74
+ ```
75
+
76
+ #### Start the Tokenizer service
77
+
78
+ ```
79
+ root@ax650:/mnt/qtang/llm-test/internvl3-1b# python3 internvl3_tokenizer.py
80
+ None None 151645 <|im_end|> 151665 151667
81
+ context_len is 256
82
+ prompt is <|im_start|>system
83
+ 你是书生·万象, 英文名是InternVL, 是由上海人工智能实验室、清华大学及多家合作单位联合开发的多模态大语言模型.<|im_end|>
84
+ ......
85
+ http://0.0.0.0:12345
86
+ ```
87
+
88
+ #### Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650 DEMO Board
89
+
90
+ - input text
91
+
92
+ ```
93
+ 描述下图片
94
+ ```
95
+
96
+ - input image
97
+
98
+ ![](./ssd_car.jpg)
99
+
100
+ Open another terminal and run `./run_internvl3_1b_448_ax650.sh`
101
+
102
+ ```
103
+ root@ax650:/mnt/qtang/llm-test/internvl3-1b# ./run_internvl_3_1b_448_ax650.sh
104
+ [I][ Init][ 134]: LLM init start
105
+ [I][ Init][ 34]: connect http://0.0.0.0:12345 ok
106
+ bos_id: -1, eos_id: 151645
107
+ img_start_token: 151665
108
+ img_context_token: 151667
109
+ 3% | ██ | 1 / 27 [0.01s<0.32s, 83.33 count/s] tokenizer init ok
110
+ [I][ Init][ 45]: LLaMaEmbedSelector use mmap
111
+ 7% | ███ | 2 / 27 [0.01s<0.19s, 142.86 count/s] embed_selector init ok
112
+ 100% | ████████████████████████████████ | 27 / 27 [6.92s<6.92s, 3.90 count/s] init post axmodel ok,remain_cmm(11068 MB)
113
+ [I][ Init][ 226]: IMAGE_CONTEXT_TOKEN: 151667, IMAGE_START_TOKEN: 151665
114
+ [I][ Init][ 251]: image encoder input nchw@float32
115
+ [I][ Init][ 281]: image encoder output float32
116
+ [I][ Init][ 291]: image_encoder_height : 448, image_encoder_width: 448
117
+ [I][ Init][ 293]: max_token_len : 2047
118
+ [I][ Init][ 296]: kv_cache_size : 128, kv_cache_num: 2047
119
+ [I][ Init][ 304]: prefill_token_num : 128
120
+ [I][ Init][ 308]: grp: 1, prefill_max_token_num : 1
121
+ [I][ Init][ 308]: grp: 2, prefill_max_token_num : 128
122
+ [I][ Init][ 308]: grp: 3, prefill_max_token_num : 256
123
+ [I][ Init][ 308]: grp: 4, prefill_max_token_num : 384
124
+ [I][ Init][ 308]: grp: 5, prefill_max_token_num : 512
125
+ [I][ Init][ 308]: grp: 6, prefill_max_token_num : 640
126
+ [I][ Init][ 308]: grp: 7, prefill_max_token_num : 768
127
+ [I][ Init][ 308]: grp: 8, prefill_max_token_num : 896
128
+ [I][ Init][ 308]: grp: 9, prefill_max_token_num : 1024
129
+ [I][ Init][ 312]: prefill_max_token_num : 1024
130
+ [I][ load_config][ 282]: load config:
131
+ {
132
+ "enable_repetition_penalty": false,
133
+ "enable_temperature": true,
134
+ "enable_top_k_sampling": true,
135
+ "enable_top_p_sampling": false,
136
+ "penalty_window": 20,
137
+ "repetition_penalty": 1.2,
138
+ "temperature": 0.9,
139
+ "top_k": 10,
140
+ "top_p": 0.8
141
+ }
142
+
143
+ [I][ Init][ 321]: LLM init ok
144
+ Type "q" to exit, Ctrl+c to stop current running
145
+ prompt >> 描述下图片
146
+ image >> ssd_car.jpg
147
+ [I][ Encode][ 415]: image encode time : 387.35 ms, size : 229376
148
+ [I][ Encode][ 524]: idx:0 offset : 50 out_embed.size() : 279552
149
+ [I][ Run][ 551]: input token num : 312, prefill_split_num : 3
150
+ [I][ Run][ 566]: prefill grpid 4
151
+ [I][ Run][ 593]: input_num_token:128
152
+ [I][ Run][ 593]: input_num_token:128
153
+ [I][ Run][ 593]: input_num_token:56
154
+ [I][ Run][ 717]: ttft: 623.71 ms
155
+ 图片中出现的物体包括:
156
+
157
+ 1. 一辆红色的双层巴士,巴士上有一则广告,广告上写着“THINGS GET MORE EXCITING WHEN YOU SAY YES” (当你说“是”时,事情就更兴奋了)。
158
+ 2. 一位微笑的女性站在巴士旁边。
159
+ 3. 一辆黑色的汽车停在路边。
160
+ 4. 一家商店的橱窗。
161
+ 5. 一些建筑物的外墙和窗户。
162
+ 6. 一根黑色的路灯杆。
163
+
164
+ 这些是图片中实际存在的物体。
165
+
166
+ [N][ Run][ 826]: hit eos,avg 28.78 token/s
167
+
168
+ prompt >> q
169
+
170
+ ```