jwnder commited on
Commit
998a974
·
verified ·
1 Parent(s): c8a9a3d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -0
README.md CHANGED
@@ -7,7 +7,43 @@ This is a quantized version of the Jais-13b-chat model
7
 
8
  To load this model you will need the bitsandbytes quantization method
9
 
 
10
  - Compute d-type: bfloat16
11
  - Quantization Type : nf4
12
  - Load in 4-bit: True
13
  - Use double quantization: True
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
 
8
  To load this model you will need the bitsandbytes quantization method
9
 
10
+ If you are using text-generator-webui Select Transformers
11
  - Compute d-type: bfloat16
12
  - Quantization Type : nf4
13
  - Load in 4-bit: True
14
  - Use double quantization: True
15
+
16
+
17
+
18
+ ```python
19
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
20
+ import transformers
21
+ import torch
22
+
23
+ model_name = "jwnder/core42_jais-13b-chat-bnb-4bit"
24
+
25
+ import warnings
26
+ warnings.filterwarnings('ignore')
27
+
28
+ bnb_config = BitsAndBytesConfig(
29
+ load_in_4bit=True,
30
+ bnb_4bit_quant_type="nf4",
31
+ bnb_4bit_compute_dtype=torch.bfloat16,
32
+ bnb_4bit_use_double_quant=True,
33
+ llm_int8_enable_fp32_cpu_offload=True
34
+ )
35
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
36
+ model = AutoModelForCausalLM.from_pretrained(
37
+ model_name,
38
+ quantization_config=bnb_config,
39
+ device_map="auto",
40
+ trust_remote_code=True
41
+ )
42
+
43
+ inputs = tokenizer("Testing LLM!", return_tensors="pt")
44
+ start = datetime.now()
45
+ outputs = model.generate(**inputs)
46
+ end = datetime.now()
47
+ print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
48
+
49
+ ```