gguf?
please
Unable to run even with 4090 and 3090, running out of memory.
Unable to run even with 4090 and 3090, running out of memory.
a 16b model running out of memory on a 24gb 4090? that's interesting, well I run models on the cloud (because I have a 3060 6gb locally) so I can't complain.
It's MoE model. I am unable to even run the sample code provided and both GPU's are close to maxing out in terms of RAM. 4090 has less than 1GB left and it asks for 1 Gb.
Unable to run even with 4090 and 3090, running out of memory.
That's expected... the model files in total already > 24GB... you need quantized version
Unable to run even with 4090 and 3090, running out of memory.
That's expected... the model files in total already > 24GB... you need quantized version
My dual-4090 rig also went poooooff, OOM. Any ideas? xD
Yes, id like GGUF. id like to test out this model
Same if you got it. The safetensors compiling or assembling thing is prone to failure, and I'm not sure if I'm getting it right.