gguf?

#1
by AlgorithmicKing - opened

please

Unable to run even with 4090 and 3090, running out of memory.

Unable to run even with 4090 and 3090, running out of memory.

a 16b model running out of memory on a 24gb 4090? that's interesting, well I run models on the cloud (because I have a 3060 6gb locally) so I can't complain.

It's MoE model. I am unable to even run the sample code provided and both GPU's are close to maxing out in terms of RAM. 4090 has less than 1GB left and it asks for 1 Gb.

Unable to run even with 4090 and 3090, running out of memory.

That's expected... the model files in total already > 24GB... you need quantized version

Unable to run even with 4090 and 3090, running out of memory.

That's expected... the model files in total already > 24GB... you need quantized version

My dual-4090 rig also went poooooff, OOM. Any ideas? xD

Yes, id like GGUF. id like to test out this model

Same if you got it. The safetensors compiling or assembling thing is prone to failure, and I'm not sure if I'm getting it right.

Sign up or log in to comment