Doctor-Shotgun commited on
Commit
4795ae3
·
1 Parent(s): dc9071c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -0
README.md ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ inference: false
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ tags:
7
+ - llama
8
+ - llama-2
9
+ license: agpl-3.0
10
+ ---
11
+ # CalliopeDS-v2-L2-13B-exl2
12
+
13
+ Exllama v2 quant of [Doctor-Shotgun/CalliopeDS-v2-L2-13B](https://huggingface.co/Doctor-Shotgun/CalliopeDS-v2-L2-13B)
14
+
15
+ Branches:
16
+ - main: measurement.json calculated at 2048 token calibration rows on PIPPA
17
+ - 4.0bpw-h6: 4 decoder bits per weight, 6 head bits
18
+ - ideal for 12gb GPUs, or 16gb GPUs with NTK extended context or CFG
19
+ - 6.0bpw-h6: 6 decoder bits per weight, 6 head bits
20
+ - ideal for 16gb GPUs, or 24gb GPUs with NTK extended context or CFG
21
+ - 8bit-32g-h8: all tensors 8bit 32g, 8 head bits
22
+ - experimental quant, this is with exllamav2 monkeypatched to quantize all tensors to 8bit 32g
23
+ - similar in size to old GPTQ 8bit no groupsize, recommend 24gb GPU