gabriellarson/Kimi-Dev-72B-GGUF · Is there any documentation about the sampling parameters maybe?

Thank you! Excellent - can confirm works in LMStudio. On oldish MBP M2 96GB RAM did

$ sudo sysctl iogpu.wired_limit_mb=85000

pre-emptively to "offload" all of the layers to GPU/VRAM on the Load Model slider presented. (but didn't try without) This is Kimi-Dev-72B-Q5_0 quants. I get about 3 tps.

Is there any documentation about the sampling parameters maybe? I never learned if any of that meta-data is passed on as extra info with the weights in the gguf, or not. If anyone can confirm or deny teach me much appreciated - thanks. I understand gguf-s weights are probably mmap-ed but that does not preclude putting the info there too.

Sampling parameters at inference time - I'm using T=0.8, TopK=40, RepeatPenalty=1.1, MinP=0.05, TopP=0.95 which LMStudio came up with. IDK how or where from. Maybe embedded with the gguf-s, maybe these were the prior params I run LMStudio with some other model.

Thanks for your stellar work - much appreciated.