tulu-30B merged with bhenrym14's airoboros-33b-gpt4-1.4.1-PI-8192-LoRA, quantized at 4 bit.
More info about the LoRA Here. This is an alternative to SuperHOT 8k LoRA trained with LoRA_rank 64, and airoboros 1.4.1 dataset.
It was created with GPTQ-for-LLaMA with group size 32 and act order true as parameters, to get the maximum perplexity vs FP16 model.
I HIGHLY suggest to use exllama, to evade some VRAM issues.
Use compress_pos_emb = 4 for any context up to 8192 context.
If you have 2x24 GB VRAM GPUs cards, to not get Out of Memory errors at 8192 context, use:
gpu_split: 9,21
- Downloads last month
- 9
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.