qqc1989 commited on
Commit
dd1b693
ยท
verified ยท
1 Parent(s): 08e1411

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -7
README.md CHANGED
@@ -4,12 +4,14 @@ language:
4
  - zh
5
  - en
6
  base_model:
7
- - Qwen/Qwen2.5-1.5B-Instruct
 
8
  pipeline_tag: text-generation
9
  library_name: transformers
10
  tags:
11
  - Context
12
- - Qwen2.5-1.5B
 
13
  ---
14
 
15
  # Qwen2.5-1.5B-Instruct-CTX-Int8
@@ -18,7 +20,7 @@ This version of Qwen2.5-1.5B-Instruct-CTX-Int8 has been converted to run on the
18
 
19
  This model has been optimized with the following LoRA:
20
 
21
- Compatible with Pulsar2 version: 4.0(Not released yet)
22
 
23
  ## Feature
24
 
@@ -36,6 +38,23 @@ For those who are interested in model conversion, you can try to export axmodel
36
 
37
  [AXera NPU AXCL LLM Runtime](https://github.com/ZHEQIUSHUI/ax-llm/tree/axcl-context-kvcache)
38
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
  ## Support Platform
40
 
41
  - AX650
@@ -47,7 +66,7 @@ For those who are interested in model conversion, you can try to export axmodel
47
 
48
  |Chips|w8a16|w4a16| DDR | Flash |
49
  |--|--|--|--|--|
50
- |AX650| 11 tokens/sec| *TBD* | 2.3GB | 2.3GB |
51
 
52
  ## How to use
53
 
@@ -56,17 +75,20 @@ Download all files from this repository to the device
56
  ```
57
  root@ax650:/mnt/qtang/llm-test/Qwen2.5-1.5B-Instruct-CTX-Int8# tree -L 1
58
  .
59
- โ”œโ”€โ”€ kvcache
60
- โ”œโ”€โ”€ main
61
  โ”œโ”€โ”€ main_axcl_aarch64
62
  โ”œโ”€โ”€ main_axcl_x86
63
  โ”œโ”€โ”€ post_config.json
64
  โ”œโ”€โ”€ qwen2.5-1.5b-ctx-ax650
 
65
  โ”œโ”€โ”€ qwen2.5_tokenizer
66
  โ”œโ”€โ”€ qwen2.5_tokenizer_uid.py
 
67
  โ”œโ”€โ”€ run_qwen2.5_1.5b_ctx_ax650.sh
68
  โ”œโ”€โ”€ run_qwen2.5_1.5b_ctx_axcl_aarch64.sh
69
- โ””โ”€โ”€ run_qwen2.5_1.5b_ctx_axcl_x86.sh
 
70
  ```
71
 
72
  #### Start the Tokenizer service
 
4
  - zh
5
  - en
6
  base_model:
7
+ - Qwen/Qwen2.5-1.5B-Instruct-GPTQ-INT8
8
+ - Qwen/Qwen2.5-1.5B-Instruct-GPTQ-INT4
9
  pipeline_tag: text-generation
10
  library_name: transformers
11
  tags:
12
  - Context
13
+ - Qwen2.5-1.5B-Instruct-GPTQ-INT8
14
+ - Qwen2.5-1.5B-Instruct-GPTQ-INT4
15
  ---
16
 
17
  # Qwen2.5-1.5B-Instruct-CTX-Int8
 
20
 
21
  This model has been optimized with the following LoRA:
22
 
23
+ Compatible with Pulsar2 version: 4.1
24
 
25
  ## Feature
26
 
 
38
 
39
  [AXera NPU AXCL LLM Runtime](https://github.com/ZHEQIUSHUI/ax-llm/tree/axcl-context-kvcache)
40
 
41
+ ### Convert script
42
+
43
+ ```
44
+ pulsar2 llm_build --input_path Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8 \
45
+ --output_path Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8-ctx-ax650 \
46
+ --hidden_state_type bf16 --kv_cache_len 2047 --prefill_len 128 \
47
+ --last_kv_cache_len 128 \
48
+ --last_kv_cache_len 256 \
49
+ --last_kv_cache_len 384 \
50
+ --last_kv_cache_len 512 \
51
+ --last_kv_cache_len 640 \
52
+ --last_kv_cache_len 768 \
53
+ --last_kv_cache_len 896 \
54
+ --last_kv_cache_len 1024 \
55
+ --chip AX650 -c 1 --parallel 8
56
+ ```
57
+
58
  ## Support Platform
59
 
60
  - AX650
 
66
 
67
  |Chips|w8a16|w4a16| DDR | Flash |
68
  |--|--|--|--|--|
69
+ |AX650| 12 tokens/sec| 17 tokens/sec | 2.3GB | 2.3GB |
70
 
71
  ## How to use
72
 
 
75
  ```
76
  root@ax650:/mnt/qtang/llm-test/Qwen2.5-1.5B-Instruct-CTX-Int8# tree -L 1
77
  .
78
+ โ”œโ”€โ”€ main_api
79
+ โ”œโ”€โ”€ main_ax650
80
  โ”œโ”€โ”€ main_axcl_aarch64
81
  โ”œโ”€โ”€ main_axcl_x86
82
  โ”œโ”€โ”€ post_config.json
83
  โ”œโ”€โ”€ qwen2.5-1.5b-ctx-ax650
84
+ โ”œโ”€โ”€ qwen2.5-1.5b-ctx-int4-ax650
85
  โ”œโ”€โ”€ qwen2.5_tokenizer
86
  โ”œโ”€โ”€ qwen2.5_tokenizer_uid.py
87
+ โ”œโ”€โ”€ run_qwen2.5_1.5b_ctx_ax650_api.sh
88
  โ”œโ”€โ”€ run_qwen2.5_1.5b_ctx_ax650.sh
89
  โ”œโ”€โ”€ run_qwen2.5_1.5b_ctx_axcl_aarch64.sh
90
+ โ”œโ”€โ”€ run_qwen2.5_1.5b_ctx_axcl_x86.sh
91
+ โ””โ”€โ”€ run_qwen2.5_1.5b_ctx_int4_ax650.sh
92
  ```
93
 
94
  #### Start the Tokenizer service