Repetitive

#2
by AliceThirty - opened

Hi, I just made a GGUF quantization of your model. But the Q4_K_M version feels really really really repetitive after just 4000 tokens (haven't tried other quantization versions). Is that just me or is the base model repetitive too?

This seems to be an issue with GGUF more then anything.

https://huggingface.co/ArtusDev/Delta-Vector_Shimamura-70B-EXL3

I would use EXL3 over GGUF, I don't endorse any GGUF quants made of my models because the format is very prone to breaking my models in weird ways.

Thank you, I didn't know GGUF would break models.
Also, I never used the EXL3 format before. Would you have any recommendations on how to inference it on windows, with an API compatible with sillytavern? Thank you.

TabbyAPI https://github.com/theroyallab/tabbyAPI

EXL2/EXL3 should be good, If you are using lower BPW quants, EXL3 will give more quality but will be slower, If you don't mind that, EXL2 will be faster but a bit more damaged.

there's alot of setup guides for tabby aswell as guides to quanting the model.

Delta-Vector changed discussion status to closed

Sign up or log in to comment