Doctor-Shotgun
/

CalliopeDS-v2-L2-13B-exl2

Text Generation

Model card Files Files and versions

CalliopeDS-v2-L2-13B-exl2 / README.md

Doctor-Shotgun's picture

Update README.md

fe1f3c8 almost 2 years ago

|

805 Bytes

	---
	inference: false
	language:
	- en
	pipeline_tag: text-generation
	tags:
	- llama
	- llama-2
	license: llama2
	---
	# CalliopeDS-v2-L2-13B-exl2

	Exllama v2 quant of [Doctor-Shotgun/CalliopeDS-v2-L2-13B](https://huggingface.co/Doctor-Shotgun/CalliopeDS-v2-L2-13B)

	Branches:
	- main: measurement.json calculated at 2048 token calibration rows on PIPPA
	- 4.0bpw-h6: 4 decoder bits per weight, 6 head bits
	- ideal for 12gb GPUs, or 16gb GPUs with NTK extended context or CFG
	- 6.0bpw-h6: 6 decoder bits per weight, 6 head bits
	- ideal for 16gb GPUs, or 24gb GPUs with NTK extended context or CFG
	- 8bit-32g-h8: all tensors 8bit 32g, 8 head bits
	- experimental quant, this is with exllamav2 monkeypatched to quantize all tensors to 8bit 32g
	- similar in size to old GPTQ 8bit no groupsize, recommend 24gb GPU