EXL2 quants of alpindale/goliath-120b (https://huggingface.co/alpindale/goliath-120b), to be used on exllamav2.
Calibration dataset is wikitext. I've added a measurement.json file on the main branch if you want to do your own quants.
IMPORTANT: For the 3BPW quant, and if using ooba text gen, disable BOS Token, else you will get gibberish, see https://huggingface.co/Panchovix/goliath-120b-exl2/discussions/1
Original Model card
Goliath 120B
An auto-regressive causal LM created by combining 2x finetuned Llama-2 70B into one.
Please check out the quantized formats provided by @TheBloke and @Panchovix:
- GGUF (llama.cpp)
- GPTQ (KoboldAI, TGW, Aphrodite)
- AWQ (TGW, Aphrodite, vLLM)
- Exllamav2 (TGW, KoboldAI)
Prompting Format
Both Vicuna and Alpaca will work, but due the initial and final layers belonging primarily to Xwin, I expect Vicuna to work the best.
Merge process
The models used in the merge are Xwin and Euryale.
The layer ranges used are as follows:
- range 0, 16
Xwin
- range 8, 24
Euryale
- range 17, 32
Xwin
- range 25, 40
Euryale
- range 33, 48
Xwin
- range 41, 56
Euryale
- range 49, 64
Xwin
- range 57, 72
Euryale
- range 65, 80
Xwin
Screenshots
Benchmarks
Coming soon.
Acknowledgements
Credits goes to @chargoddard for developing the framework used to merge the model - mergekit.
Special thanks to @Undi95 for helping with the merge ratios.