How much memory do I need for this model (on Windows)?
I'm trying to run this model on Windows 11, with 48 GB of RAM and without GPU.
model_id = "../Mixtral-8x7B-Instruct-v0.1"
self.model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cpu", low_cpu_mem_usage=True)
self.tokenizer = AutoTokenizer.from_pretrained(model_id)
I see the RAM occupation rises until the error:
Loading checkpoint shards: 16%|ββ | 3/19 [01:13<06:31, 24.49s/it]
Process finished with exit code -1073741819 (0xC0000005)
Do I need more memory or is it possible to do something else?
thx
R.
I have run this model on ChatLLM.cpp:
- For quantized int4, 32 GB of RAM is enough;
- For quantized int8, 64 GB of RAM is enough.
I think it is impossible to run it with PyTorch on CPU, because PyTorch is not as efficient as GGML on CPU.
ok how can I use quantized int4?
Do I have to use "load_in_4bit=True" ?
thx
R
Yes, I've just used:
self.model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cpu", low_cpu_mem_usage=True, load_in_4bit=True)
self.tokenizer = AutoTokenizer.from_pretrained(model_id)
And I receive and error that it's not possibile quantization without GPU.
You are right :(
thx