Doctor-Shotgun
/

CalliopeDS-v2-L2-13B-exl2

Text Generation

Model card Files Files and versions

Doctor-Shotgun commited on Sep 29, 2023

Commit

4795ae3

·

1 Parent(s): dc9071c

Create README.md

Files changed (1) hide show

README.md +23 -0

README.md ADDED Viewed

	@@ -0,0 +1,23 @@

+---
+inference: false
+language:
+- en
+pipeline_tag: text-generation
+tags:
+- llama
+- llama-2
+license: agpl-3.0
+---
+# CalliopeDS-v2-L2-13B-exl2
+Exllama v2 quant of [Doctor-Shotgun/CalliopeDS-v2-L2-13B](https://huggingface.co/Doctor-Shotgun/CalliopeDS-v2-L2-13B)
+Branches:
+- main: measurement.json calculated at 2048 token calibration rows on PIPPA
+- 4.0bpw-h6: 4 decoder bits per weight, 6 head bits
+  - ideal for 12gb GPUs, or 16gb GPUs with NTK extended context or CFG
+- 6.0bpw-h6: 6 decoder bits per weight, 6 head bits
+  - ideal for 16gb GPUs, or 24gb GPUs with NTK extended context or CFG
+- 8bit-32g-h8: all tensors 8bit 32g, 8 head bits
+  - experimental quant, this is with exllamav2 monkeypatched to quantize all tensors to 8bit 32g
+  - similar in size to old GPTQ 8bit no groupsize, recommend 24gb GPU