Recommendations
Just curious, you would recommend the IQ4 over the Q8 model for creative/story building tasks? Is there a specific reason? Would Q8 not be much higher quality?
IQ4s /Q4s/IQ3 have stronger NEO dataset changes VS Q8 that has none.
Likewise the "math" - how tokens are processed - is more varied in the lower quants - especially with Imatrix - whereas Q8/Q6 is relatively "flat".
IQ4XS and IQ4NL are very difference, relative to other Q and IQ quants and generate the most interesting prose.
That being said, you might get better quality at higher quants in terms of word choice, depth and other characteristics.
TO TEST:
Use the same prompt, and different quants:
Test at TEMP=0 ; (regen a few times to clear caching) for each quant.
(use separate chat/new chat for each - DO NOT MIX)
This will give you a "CORE" test so you can compare generation differences between quants.
Special NOTE:
Suggest you use the system prompt for "creative" (even for core test!), as without, this Mistral Model seems "flat" - regardless of the prompt/quant, unless
you include prose instructions / prose directives.