shenzhi-wang commited on
Commit
ed7451f
·
verified ·
1 Parent(s): 2ddc42c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -9
README.md CHANGED
@@ -53,22 +53,39 @@ Training details:
53
 
54
  # 2. Usage
55
 
 
 
 
 
 
 
56
  ```python
57
- from transformers import AutoTokenizer, AutoModelForCausalLM
 
 
58
 
59
- model_id = "shenzhi-wang/Llama3.1-8B-Chinese-Chat"
 
 
 
 
 
 
 
 
60
 
61
  tokenizer = AutoTokenizer.from_pretrained(model_id)
62
  model = AutoModelForCausalLM.from_pretrained(
63
- model_id, torch_dtype="auto", device_map="auto"
 
 
64
  )
65
 
66
- messages = [
67
- {"role": "user", "content": "写一首诗吧"},
68
  ]
69
-
70
  input_ids = tokenizer.apply_chat_template(
71
- messages, add_generation_prompt=True, return_tensors="pt"
72
  ).to(model.device)
73
 
74
  outputs = model.generate(
@@ -78,6 +95,12 @@ outputs = model.generate(
78
  temperature=0.6,
79
  top_p=0.9,
80
  )
81
- response = outputs[0][input_ids.shape[-1]:]
82
  print(tokenizer.decode(response, skip_special_tokens=True))
83
- ```
 
 
 
 
 
 
 
53
 
54
  # 2. Usage
55
 
56
+ ## 2.1 Usage of Our BF16 Model
57
+
58
+ 1. Please upgrade the `transformers` package to ensure it supports Llama3.1 models. The current version we are using is `4.43.0`.
59
+
60
+ 2. Use the following Python script to download our BF16 model
61
+
62
  ```python
63
+ from huggingface_hub import snapshot_download
64
+ snapshot_download(repo_id="shenzhi-wang/Llama3.1-8B-Chinese-Chat", ignore_patterns=["*.gguf"]) # Download our BF16 model without downloading GGUF models.
65
+ ```
66
 
67
+ 3. Inference with the BF16 model
68
+
69
+ ```python
70
+ import torch
71
+ import transformers
72
+ from transformers import AutoModelForCausalLM, AutoTokenizer
73
+
74
+ model_id = "/Your/Local/Path/to/Llama3.1-8B-Chinese-Chat"
75
+ dtype = torch.bfloat16
76
 
77
  tokenizer = AutoTokenizer.from_pretrained(model_id)
78
  model = AutoModelForCausalLM.from_pretrained(
79
+ model_id,
80
+ device_map="cuda",
81
+ torch_dtype=dtype,
82
  )
83
 
84
+ chat = [
85
+ {"role": "user", "content": "写一首关于机器学习的诗。"},
86
  ]
 
87
  input_ids = tokenizer.apply_chat_template(
88
+ chat, tokenize=True, add_generation_prompt=True, return_tensors="pt"
89
  ).to(model.device)
90
 
91
  outputs = model.generate(
 
95
  temperature=0.6,
96
  top_p=0.9,
97
  )
98
+ response = outputs[0][input_ids.shape[-1] :]
99
  print(tokenizer.decode(response, skip_special_tokens=True))
100
+ ```
101
+
102
+ ## 2.2 Usage of Our GGUF Models
103
+
104
+ 1. Download our GGUF models from the [gguf_models folder](https://huggingface.co/shenzhi-wang/Llama3.1-8B-Chinese-Chat/tree/main/gguf);
105
+ 2. Use the GGUF models with [LM Studio](https://lmstudio.ai/);
106
+ 3. Or, you can follow the instructions from https://github.com/ggerganov/llama.cpp/tree/master#usage to use gguf models.