jwnder
/

core42_jais-13b-chat-bnb-4bit

Text Generation

4-bit precision

Model card Files Files and versions Community

jwnder commited on Apr 6, 2024

Commit

998a974

·

verified ·

1 Parent(s): c8a9a3d

Update README.md

Files changed (1) hide show

README.md +36 -0

README.md CHANGED Viewed

@@ -7,7 +7,43 @@ This is a quantized version of the Jais-13b-chat model
 To load this model you will need the bitsandbytes quantization method
 - Compute d-type: bfloat16
 - Quantization Type : nf4
 - Load in 4-bit: True
 - Use double quantization: True

 To load this model you will need the bitsandbytes quantization method
+If you are using text-generator-webui Select Transformers
 - Compute d-type: bfloat16
 - Quantization Type : nf4
 - Load in 4-bit: True
 - Use double quantization: True
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
+import transformers
+import torch
+model_name = "jwnder/core42_jais-13b-chat-bnb-4bit"
+import warnings
+warnings.filterwarnings('ignore')
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.bfloat16,
+    bnb_4bit_use_double_quant=True,
+    llm_int8_enable_fp32_cpu_offload=True
+)
+tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    quantization_config=bnb_config,
+    device_map="auto",
+    trust_remote_code=True
+)
+inputs = tokenizer("Testing LLM!", return_tensors="pt")
+start = datetime.now()
+outputs = model.generate(**inputs)
+end = datetime.now()
+print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
+```