Running a 4-bit Quantized 7B Model on a PC: Feasibility and Insights

#109
by edw-hug-face - opened

I'm considering running the 4-bit quantized Mistral 7B model on a standard PC with 32GB RAM. This model is famed for its efficiency on mobile devices, so I'm wondering about its performance on a PC. My plan is to train it on a small local SQL database in English. Is it realistic to expect smooth operation on a consumer-grade PC? Any insights or advice would be greatly appreciated.

Running is one thing, training is another. I don't think training is feasible (or even possible with RAM), but please correct me if I'm wrong. You can use Google Colab or some other cloud run service for finetuning. There are loads of scripts to get up and running. It's even possible with the free instance, if the dataset is small and the context is short.

Not sure on PC, but ran some experiments on running a quantized version on cloud, you can check the repo here - https://huggingface.co/Inferless/Mixtral-8x7B-v0.1-int8-GPTQ

@edw-hug-face your probably thinking about mistral not mixtral(which this is).
Mixtral is 46b parameters not 7b
Mistral is very good for its size as wel but obviously worse then mixtral
Mixtral is absolutely not running on a normal phone and it will take like 24gb ram/vram.

Training with qlora(the lowest ram taking training) takes 2x vram so roughly 48gb ram/vram.

Mistral can run in just 4gb vram/ram and be trained in like 12gb ram/vram so yeah that’s possible.

However cpu is horribly slow(takes a day for just like a few hundred q and a)

So I reccomend you use colab since it has a 15gb vram gpu which will massively increase speed.

Actually, with https://github.com/unslothai/unsloth, you can get quite a bit under 12GB, maybe down to 8-9GB for a 7b.

Sign up or log in to comment