Huge discrepancy in similarity scores between original model and GGUF:f32 variant

#1
by Kosmonik - opened

Hello,

I'v noticed a large difference in vectors and similarity scores between Original model and GGUF:f32 export. I followed the inference script provided in the model card without modification. Here are the results I obtained:

  • Original model:
    [0.9360030889511108, 0.8591322898864746, 0.7285829782485962]
  • GGUF:f32 model:
    [0.8376378056000809, 0.16814875359639678, 0.13949836612331074]

Is it expected to see this level of quality degradation when exporting to GGUF (f32)? I understand that some drop in quality is normal, but this difference seems unusually large.

Thank you for any guidance!

Environment

  • Hardware: MacBook M1 Max (32 GB RAM)
  • ollama: 0.8.0

https://t.me/evilfreelancer/1303

Name Original GGUF f32 GGUF f16 GGUF bf16 GGUF q8_0 GGUF tq2_0 GGUF tq1_0
STSBTask 0.8444961710957753 0.7736137897844315 0.7736028484956837 0.7735618755674672 0.773361958282741 0.3647773184116074 0.3647773184116074
ParaphraserTask 0.760147408288547 0.5760219715717907 0.5760429923959474 0.5760224851600871 0.5761467826017955 0.11769859613134119 0.11769859613134119
XnliTask 0.4796407185628742 0.4165668662674651 0.4165668662674651 0.4167664670658683 0.4155688622754491 0.3393213572854291 0.3393213572854291
SentimentTask 0.836 0.7983333333333333 0.7976666666666666 0.7993333333333333 0.799 0.5176666666666667 0.5176666666666667
ToxicityTask 0.989866 0.982673 0.98267 0.9826709999999999 0.982684 0.724437 0.724437
InappropriatenessTask 0.8463448532935726 0.8284147185686138 0.8285707439160764 0.8284083284519942 0.828561158741147 0.5766776718675115 0.5766776718675115
IntentsTask 0.8034 0.7386 0.7386 0.739 0.7392 0.2868 0.2868
IntentsXTask 0.781 0.6956 0.6956 0.6954 0.6942 0.1096 0.1096
FactRuTask 0.24240861548860063 0.12707425468979844 0.12693111870225876 0.1266266723454379 0.12664414521133774 0.05964525242263313 0.05964525242263313
RudrTask 0.26938953157907136 0.15240762044883197 0.15240762044883197 0.15409351961518972 0.1553384424039951 0.060060972146900204 0.060060972146900204
SpeedTask (gpu) 28.902501265207924 - - - - - -
SpeedTask (cpu) 994.7337913513184 184.08727248509723 110.00351905822754 111.12988392512004 81.11035426457723 53.92512798309326 90.35333156585693

Sign up or log in to comment