🥲 Failed to load the modeL
Latest beta LM Studio and latest Cuda 12 Runtime(Windows).
error: 🥲 Failed to load the model
Error loading model.
(Exit code: 18446744072635812000). Unknown error. Try a different model and/or config.
It is linked to CPU offloading. Once you reduce the context to fit within VRAM, it loads no issue and runs, but CPU offloading is broken for me.
I had a similar issue with Google's QAT models in that they would load but when processing the prompt, they would just get stuck after a certain % and not process the rest, maybe also due to issues processing through the CPU/RAM?
Is there even any point in using these models then, if I have to choose a much lower quant to fit within VRAM? You probably end up with worse intelligence for the same size no?
That was Q3_K_XL I think. I deleted it and tried IQ2_XXS, and even though it comfortably fits in VRAM (RTX3080 10GB), it fails right at the point that it seems fully loaded.
Debug log analysis according to o4-mini:
The culprit here is buried in these two adjacent blocks:
clip_ctx: CLIP using CUDA0 backend
clip_model_loader: model name: Gemma-3-12B-It-Qat
…
load_hparams: vision_encoder: 1
…
load_hparams: model size: 134217949.38 MiB
…
llama.cpp abort:2743: GGML_ASSERT(ggml_can_mul_mat(a, b)) failed
What’s happening is that LM Studio is trying to spin up a CLIP-style vision encoder in addition to your text LLM, and—by misconfiguration—it’s pointed at your Gemma-3-12B-It-Qat .gguf
file instead of an actual CLIP model. You can see that the loader thinks your Gemma model has a “vision_encoder” component and reports an absurd size (~128 TB!), which obviously isn’t right. When llama.cpp then tries to do a matrix multiply with tensors whose dimensions don’t line up, it trips the ggml_can_mul_mat(a, b)
assertion and aborts.
How to fix
Disable the vision encoder if you don’t need any image-to-text capability.
In your LM Studio project settings, turn off or remove the CLIP/Vision encoder entry so that only the LLM is loaded.Point to a real CLIP model if you do intend to use a multimodal (vision+LLM) setup.
Download an appropriate GGUF CLIP ViT file (e.g.clip-vit-base-patch32.gguf
) and in your project’s “Vision Encoder” slot, reference that file instead of the Gemma LLM.
Once the CLIP loader is either disabled or correctly configured, the mismatched-shapes error will go away and your Gemma model will load normally.
Same with 4b model. Is it just me??
Apparently according to some people, you'll need to wait for the latest LM Studio update. Do you know if it runs fine in llama.cpp?
It fails when using both F16 and F32 mmproj with the latest llama-mtmd-cli in llama.cpp releases(b5216). It works when switching to https://huggingface.co/lmstudio-community/gemma-3-12b-it-GGUF/blob/main/mmproj-model-f16.gguf.
How to fix
Disable the vision encoder if you don’t need any image-to-text capability.
In your LM Studio project settings, turn off or remove the CLIP/Vision encoder entry so that only the LLM is loaded.Point to a real CLIP model if you do intend to use a multimodal (vision+LLM) setup.
Download an appropriate GGUF CLIP ViT file (e.g.clip-vit-base-patch32.gguf
) and in your project’s “Vision Encoder” slot, reference that file instead of the Gemma LLM.Once the CLIP loader is either disabled or correctly configured, the mismatched-shapes error will go away and your Gemma model will load normally.
I cant figure out how to disable the vision part in LM Studio. If I am use chat\server capabilities in the program.
@danihend Please do not post "fixes" you get from LLMs without verifying them first.
I didn't post a fix, I just shared what o4-mini's analysis was.
That's why I said "Debug log analysis according to o4-mini:"
Anyone browsing this forum is certainly grown up enough to decide if they believe it or not. Did you take some irreversible actions based on it's advice or what?
@danihend Your message contains more text than the debug log, you also wrote "how to fix". This is a hallucination, and thus spam. Stop posting LLM hallucinations here. You meant to be helpful, great, thanks for that. But hallucinations are not helpful.
I didn't write that, o4 did. I am not "posting hallucinations" . Please learn how to interact with people in a normal way or just keep it to yourself.