im-request: Athena-v4 (https://huggingface.co/IkariDev/Athena-v4)

#15
by xpgx1 - opened

IF there is a slot available in the pipeline - I'd appreciate an i-matrix treatment of one of my personal favorites - Athena Version 4 by IkariDev (IkariDev/Athena-v4 - https://huggingface.co/IkariDev/Athena-v4).

It's not a priority and there are some other flavors of it available, but nobody (so far) has used more modern methods to shrink the fp16 =)

I just have had good experiences with this particular model and would like to see how it would perform with a 4-6 bpw gguf created with the importance matrix method. Have a geat week and thank you in advance!

xpgx1 changed discussion title from im-request: Athena-v4 to im-request: Athena-v4 (https://huggingface.co/IkariDev/Athena-v4)

Sure, it's in the queue!

But my static quants of that mdoel do feature a few IQ3 and IQ4 ones, don't those count as more modern? :)

Heh, yes they do =)

I just want to see what the model does when we have some higher bpw variants and can increase context (due to the lower space requirement) - my vram envelop is juuuust big enough to do these kinds of tests (12 gb). And Importance matrix weights are a bit different in behavior - at least in my testing. I am curious to see how this experimental model works when we're not in desperate low-limbo territory.

Appreciate your work, Michael!

Feel free to give feedback here once you tried it :) The imatrix generated nicely and after one more model it should start getting generated at https://huggingface.co/mradermacher/Athena-v4-i1-GGUF

mradermacher changed discussion status to closed

I've tested the new Athena-v4.i1-IQ4_XS.gguf and the 5_K_M flavor - both models behave differently than a vanilla athena-v4. It's more to the point, as if all traits of the model have been slightly sharpened. I know this is a weird, vague description - but that's my feeling here. All comparable tests (SillyTavern based chat interactions with certain Cards, using the same questions over and over) yield an improved sense of understanding, but it also inherits its flaws - or safety-bias. It's more easily tripped, but once side-stepped, these variants react better - clearer - and less confusing.

Overall - it seems that IM quantized models are, indeed, better compared 1:1. It it all vague, but that's my impression of these two variants. XS is even almost on par with larger ones, it feels. Less prone to confusion/errors. Thx!

The imatrix quants should preserve the original model better. It's indeed very unexpected that quants would be improved over the original, but without doubt, quants do change things (just usually for the worse :). As for the vagueness, yeah, I wish there was an objective way to measure such things, but there isn't, so thats all we have.

The important take-away for me is that my way of making imatrix data is not disastrously bad. So, thanks a lot!

You're welcome!
And I sadly misspoke/typed - I meant "better than a comparable q4/q5 quant". The original fp16 model is probably more coherent - but I can't load that monster into my tiny VRAM (and splitting it yields an awful inference speed, its simply too large =)

But I can absolutely confirm that your i-quants work - and they work better than anything else in their "quant-bracket".

Then they work as they should, which is reassuring (there have been a few measurements of K-L divergence and a few human tests on my imatrix quants, so I feel more confident nowadays, but in the beginning, I had no idea whether my training data is even remotely adequate).

Sign up or log in to comment