dahara1 commited on
Commit
df2b1aa
·
verified ·
1 Parent(s): f5f0d8c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +141 -0
README.md ADDED
@@ -0,0 +1,141 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - npu
4
+ - amd
5
+ - llama3.1
6
+ - RyzenAI
7
+ ---
8
+
9
+ This model is finetuned [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) and AWQ quantized and converted version to run on the [NPU installed Ryzen AI PC](https://github.com/amd/RyzenAI-SW/issues/18), for example, Ryzen 9 7940HS Processor.
10
+
11
+ For set up Ryzen AI for LLMs in window 11, see [Running LLM on AMD NPU Hardware](https://www.hackster.io/gharada2013/running-llm-on-amd-npu-hardware-19322f).
12
+
13
+ The following sample assumes that the setup on the above page has been completed.
14
+
15
+ This model has only been tested on RyzenAI for Windows 11. It does not work in Linux environments such as WSL.
16
+
17
+ RoPE support is not yet complete, but it has been confirmed that the perplexity is lower than Llama 3.
18
+
19
+ 2024/07/30
20
+ - [Ryzen AI Software 1.2](https://ryzenai.docs.amd.com/en/latest/) has been released. Please note that this model is based on [Ryzen AI Software 1.1](https://ryzenai.docs.amd.com/en/1.1/index.html) and operation with 1.2 has not been confirmed.
21
+
22
+
23
+
24
+
25
+ ### setup
26
+ In cmd windows.
27
+ ```
28
+ conda activate ryzenai-transformers
29
+ <your_install_path>\RyzenAI-SW\example\transformers\setup.bat
30
+
31
+ pip install transformers==4.43.3
32
+ # Updating the Transformers library will cause the LLama 2 sample to stop working.
33
+ # If you want to run LLama 2, revert to pip install transformers==4.34.0.
34
+ pip install tokenizers==0.19.1
35
+ pip install -U "huggingface_hub[cli]"
36
+
37
+ huggingface-cli download dahara1/llama3.1-8b-Instruct-amd-npu --revision main --local-dir llama3.1-8b-Instruct-amd-npu
38
+
39
+ copy <your_ryzen_ai-sw_install_path>\RyzenAI-SW\example\transformers\models\llama2\modeling_llama_amd.py .
40
+
41
+ # set up Runtime. see https://ryzenai.docs.amd.com/en/latest/runtime_setup.html
42
+ set XLNX_VART_FIRMWARE=<your_firmware_install_path>\voe-4.0-win_amd64\1x4.xclbin
43
+ set NUM_OF_DPU_RUNNERS=1
44
+
45
+ # save below sample script as utf8 and llama-3.1-test.py
46
+ python llama3.1-test.py
47
+ ```
48
+
49
+ ### Sample Script
50
+
51
+ ```
52
+ import torch
53
+ import psutil
54
+ import transformers
55
+ from transformers import AutoTokenizer, set_seed
56
+ import qlinear
57
+ import logging
58
+
59
+ set_seed(123)
60
+ transformers.logging.set_verbosity_error()
61
+ logging.disable(logging.CRITICAL)
62
+
63
+ messages = [
64
+ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
65
+ ]
66
+
67
+ message_list = [
68
+ "Who are you? ",
69
+ # Japanese
70
+ "あなたの乗っている船の名前は何ですか?英語ではなく全て日本語だけを使って返事をしてください",
71
+ # Chainese
72
+ "你经历过的最危险的冒险是什么?请用中文回答所有问题,不要用英文。",
73
+ # French
74
+ "À quelle vitesse va votre bateau ? Veuillez répondre uniquement en français et non en anglais.",
75
+ # Korean
76
+ "당신은 그 배의 어디를 좋아합니까? 영어를 사용하지 않고 모두 한국어로 대답하십시오.",
77
+ # German
78
+ "Wie würde Ihr Schiffsname auf Deutsch lauten? Bitte antwortet alle auf Deutsch statt auf Englisch.",
79
+ # Taiwanese
80
+ "您發現過的最令人驚奇的寶藏是什麼?請僅使用台語和繁體中文回答,不要使用英文。",
81
+ ]
82
+
83
+
84
+ if __name__ == "__main__":
85
+ p = psutil.Process()
86
+ p.cpu_affinity([0, 1, 2, 3])
87
+ torch.set_num_threads(4)
88
+
89
+ tokenizer = AutoTokenizer.from_pretrained("llama3.1-8b-Instruct-amd-npu")
90
+ ckpt = r"llama3.1-8b-Instruct-amd-npu\llama3.1_8b_w_bit_4_awq_amd.pt"
91
+ terminators = [
92
+ tokenizer.eos_token_id,
93
+ tokenizer.convert_tokens_to_ids("<|eot_id|>")
94
+ ]
95
+ model = torch.load(ckpt)
96
+ model.eval()
97
+ model = model.to(torch.bfloat16)
98
+
99
+ for n, m in model.named_modules():
100
+ if isinstance(m, qlinear.QLinearPerGrp):
101
+ print(f"Preparing weights of layer : {n}")
102
+ m.device = "aie"
103
+ m.quantize_weights()
104
+
105
+ print("system: " + messages[0]['content'])
106
+
107
+ for i in range(len(message_list)):
108
+ messages.append({"role": "user", "content": message_list[i]})
109
+ print("user: " + message_list[i])
110
+
111
+ input = tokenizer.apply_chat_template(
112
+ messages,
113
+ add_generation_prompt=True,
114
+ return_tensors="pt",
115
+ return_dict=True
116
+ )
117
+
118
+ outputs = model.generate(input['input_ids'],
119
+ max_new_tokens=600,
120
+ eos_token_id=terminators,
121
+ attention_mask=input['attention_mask'],
122
+ do_sample=True,
123
+ temperature=0.6,
124
+ top_p=0.9)
125
+
126
+ response = outputs[0][input['input_ids'].shape[-1]:]
127
+ response_message = tokenizer.decode(response, skip_special_tokens=True)
128
+ print("assistant: " + response_message)
129
+ messages.append({"role": "system", "content": response_message})
130
+
131
+ ```
132
+
133
+ ![chat_image](llama-3.1.png)
134
+
135
+ ## Acknowledgements
136
+ - [amd/RyzenAI-SW](https://github.com/amd/RyzenAI-SW)
137
+ Sample Code and Drivers.
138
+ - [mit-han-lab/llm-awq](https://github.com/mit-han-lab/llm-awq)
139
+ Thanks for AWQ quantization Method.
140
+ - [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
141
+ [Built with Meta Llama 3](https://llama.meta.com/llama3/license/)