Great Job getting these out!

by ubergarm - opened 2 days ago

2 days ago

•

Heya bullerwins, thanks for getting this out early. I've been experimenting some quantizing this dense 72B with ik_llama.cpp fork.

It was helpful for me to see how you treated that pesky ffn_down which has a column size not divisible by 256 :oof:...

If you're interested, there is some discussion on the matter and dense vs MoE in general here with ik.

I haven't released any GGUFs yet, still trying to find a mix with which I'm happy haha...

Cheers!

bullerwins

Owner 1 day ago

Seems like mainline llama.cpp adds padding to whatever it needs to fill the blocks, not sure how efficient that is

29 568 / 256 = 115 full blocks  (115 × 256 = 29 440)
remainder              128 elements (padded to 256)

ubergarm

1 day ago

Oh interesting. I thought maybe you specifically chose Q8_0 for all ffn_down layers because I assumed the _0 style quants work with these non 256 divisible column sizes.

Thanks for the note, so many subtle details going on!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment