EXL2 quants of alpindale/goliath-120b (https://huggingface.co/alpindale/goliath-120b), to be used on exllamav2.

Calibration dataset is wikitext. I've added a measurement.json file on the main branch if you want to do your own quants.

IMPORTANT: For the 3BPW quant, and if using ooba text gen, disable BOS Token, else you will get gibberish, see https://huggingface.co/Panchovix/goliath-120b-exl2/discussions/1

4.5bpw

3bpw

Original Model card

Goliath 120B

An auto-regressive causal LM created by combining 2x finetuned Llama-2 70B into one.

Please check out the quantized formats provided by @TheBloke and @Panchovix:

GGUF (llama.cpp)
GPTQ (KoboldAI, TGW, Aphrodite)
AWQ (TGW, Aphrodite, vLLM)
Exllamav2 (TGW, KoboldAI)

Prompting Format

Both Vicuna and Alpaca will work, but due the initial and final layers belonging primarily to Xwin, I expect Vicuna to work the best.

Merge process

The models used in the merge are Xwin and Euryale.

The layer ranges used are as follows:

- range 0, 16
  Xwin
- range 8, 24
  Euryale
- range 17, 32
  Xwin
- range 25, 40
  Euryale
- range 33, 48
  Xwin
- range 41, 56
  Euryale
- range 49, 64
  Xwin
- range 57, 72
  Euryale
- range 65, 80
  Xwin

Screenshots

Benchmarks

Coming soon.

Acknowledgements

Credits goes to @chargoddard for developing the framework used to merge the model - mergekit.

Special thanks to @Undi95 for helping with the merge ratios.