Repetitive

by AliceThirty - opened Jul 20

Jul 20

Hi, I just made a GGUF quantization of your model. But the Q4_K_M version feels really really really repetitive after just 4000 tokens (haven't tried other quantization versions). Is that just me or is the base model repetitive too?

Delta-Vector

Owner Jul 20

This seems to be an issue with GGUF more then anything.

https://huggingface.co/ArtusDev/Delta-Vector_Shimamura-70B-EXL3

I would use EXL3 over GGUF, I don't endorse any GGUF quants made of my models because the format is very prone to breaking my models in weird ways.

AliceThirty

Jul 21

•

edited Jul 21

Thank you, I didn't know GGUF would break models.
Also, I never used the EXL3 format before. Would you have any recommendations on how to inference it on windows, with an API compatible with sillytavern? Thank you.

Delta-Vector

Owner Jul 21

TabbyAPI https://github.com/theroyallab/tabbyAPI

EXL2/EXL3 should be good, If you are using lower BPW quants, EXL3 will give more quality but will be slower, If you don't mind that, EXL2 will be faster but a bit more damaged.

there's alot of setup guides for tabby aswell as guides to quanting the model.

Delta-Vector changed discussion status to closed Jul 21

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment