gguf?

by AlgorithmicKing - opened Apr 10

Discussion

AlgorithmicKing

Apr 10

please

angsuman

Apr 15

Unable to run even with 4090 and 3090, running out of memory.

AlgorithmicKing

Apr 16

Unable to run even with 4090 and 3090, running out of memory.

a 16b model running out of memory on a 24gb 4090? that's interesting, well I run models on the cloud (because I have a 3060 6gb locally) so I can't complain.

angsuman

Apr 16

It's MoE model. I am unable to even run the sample code provided and both GPU's are close to maxing out in terms of RAM. 4090 has less than 1GB left and it asks for 1 Gb.

CHNtentes

Apr 16

Unable to run even with 4090 and 3090, running out of memory.

That's expected... the model files in total already > 24GB... you need quantized version

evilperson068

Apr 16

Unable to run even with 4090 and 3090, running out of memory.

That's expected... the model files in total already > 24GB... you need quantized version

My dual-4090 rig also went poooooff, OOM. Any ideas? xD

Utochi

Apr 20

Yes, id like GGUF. id like to test out this model

Lynnspyre

Apr 26

Same if you got it. The safetensors compiling or assembling thing is prone to failure, and I'm not sure if I'm getting it right.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment