|
--- |
|
inference: false |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
tags: |
|
- llama |
|
- llama-2 |
|
license: llama2 |
|
--- |
|
# CalliopeDS-v2-L2-13B-exl2 |
|
|
|
Exllama v2 quant of [Doctor-Shotgun/CalliopeDS-v2-L2-13B](https://huggingface.co/Doctor-Shotgun/CalliopeDS-v2-L2-13B) |
|
|
|
Branches: |
|
- main: measurement.json calculated at 2048 token calibration rows on PIPPA |
|
- 4.0bpw-h6: 4 decoder bits per weight, 6 head bits |
|
- ideal for 12gb GPUs, or 16gb GPUs with NTK extended context or CFG |
|
- 6.0bpw-h6: 6 decoder bits per weight, 6 head bits |
|
- ideal for 16gb GPUs, or 24gb GPUs with NTK extended context or CFG |
|
- 8bit-32g-h8: all tensors 8bit 32g, 8 head bits |
|
- experimental quant, this is with exllamav2 monkeypatched to quantize all tensors to 8bit 32g |
|
- similar in size to old GPTQ 8bit no groupsize, recommend 24gb GPU |