File size: 805 Bytes
4795ae3
 
 
 
 
 
 
 
fe1f3c8
4795ae3
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
---
inference: false
language:
- en
pipeline_tag: text-generation
tags:
- llama
- llama-2
license: llama2
---
# CalliopeDS-v2-L2-13B-exl2

Exllama v2 quant of [Doctor-Shotgun/CalliopeDS-v2-L2-13B](https://huggingface.co/Doctor-Shotgun/CalliopeDS-v2-L2-13B)

Branches:
- main: measurement.json calculated at 2048 token calibration rows on PIPPA
- 4.0bpw-h6: 4 decoder bits per weight, 6 head bits
  - ideal for 12gb GPUs, or 16gb GPUs with NTK extended context or CFG
- 6.0bpw-h6: 6 decoder bits per weight, 6 head bits
  - ideal for 16gb GPUs, or 24gb GPUs with NTK extended context or CFG
- 8bit-32g-h8: all tensors 8bit 32g, 8 head bits
  - experimental quant, this is with exllamav2 monkeypatched to quantize all tensors to 8bit 32g
  - similar in size to old GPTQ 8bit no groupsize, recommend 24gb GPU