🥲 Failed to load the modeL

by danihend - opened Apr 27

Apr 27

Latest beta LM Studio and latest Cuda 12 Runtime(Windows).

error: 🥲 Failed to load the model
Error loading model.
(Exit code: 18446744072635812000). Unknown error. Try a different model and/or config.

It is linked to CPU offloading. Once you reduce the context to fit within VRAM, it loads no issue and runs, but CPU offloading is broken for me.

I had a similar issue with Google's QAT models in that they would load but when processing the prompt, they would just get stuck after a certain % and not process the rest, maybe also due to issues processing through the CPU/RAM?

Is there even any point in using these models then, if I have to choose a much lower quant to fit within VRAM? You probably end up with worse intelligence for the same size no?

danihend

Apr 27

That was Q3_K_XL I think. I deleted it and tried IQ2_XXS, and even though it comfortably fits in VRAM (RTX3080 10GB), it fails right at the point that it seems fully loaded.

Debug log analysis according to o4-mini:
The culprit here is buried in these two adjacent blocks:

clip_ctx: CLIP using CUDA0 backend
clip_model_loader: model name:   Gemma-3-12B-It-Qat
…
load_hparams: vision_encoder:     1
…
load_hparams: model size:         134217949.38 MiB
…
llama.cpp abort:2743: GGML_ASSERT(ggml_can_mul_mat(a, b)) failed

What’s happening is that LM Studio is trying to spin up a CLIP-style vision encoder in addition to your text LLM, and—by misconfiguration—it’s pointed at your Gemma-3-12B-It-Qat .gguf file instead of an actual CLIP model. You can see that the loader thinks your Gemma model has a “vision_encoder” component and reports an absurd size (~128 TB!), which obviously isn’t right. When llama.cpp then tries to do a matrix multiply with tensors whose dimensions don’t line up, it trips the ggml_can_mul_mat(a, b) assertion and aborts.

How to fix

Disable the vision encoder if you don’t need any image-to-text capability.
In your LM Studio project settings, turn off or remove the CLIP/Vision encoder entry so that only the LLM is loaded.
Point to a real CLIP model if you do intend to use a multimodal (vision+LLM) setup.
Download an appropriate GGUF CLIP ViT file (e.g. clip-vit-base-patch32.gguf) and in your project’s “Vision Encoder” slot, reference that file instead of the Gemma LLM.

Once the CLIP loader is either disabled or correctly configured, the mismatched-shapes error will go away and your Gemma model will load normally.

danihend

Apr 27

Same with 4b model. Is it just me??

shimmyshimmer

Unsloth AI org Apr 28

Apparently according to some people, you'll need to wait for the latest LM Studio update. Do you know if it runs fine in llama.cpp?

bedobedo

Apr 29

It fails when using both F16 and F32 mmproj with the latest llama-mtmd-cli in llama.cpp releases(b5216). It works when switching to https://huggingface.co/lmstudio-community/gemma-3-12b-it-GGUF/blob/main/mmproj-model-f16.gguf.

Moleculo

Apr 30

How to fix

Disable the vision encoder if you don’t need any image-to-text capability.
In your LM Studio project settings, turn off or remove the CLIP/Vision encoder entry so that only the LLM is loaded.

Point to a real CLIP model if you do intend to use a multimodal (vision+LLM) setup.
Download an appropriate GGUF CLIP ViT file (e.g. clip-vit-base-patch32.gguf) and in your project’s “Vision Encoder” slot, reference that file instead of the Gemma LLM.

Once the CLIP loader is either disabled or correctly configured, the mismatched-shapes error will go away and your Gemma model will load normally.

I cant figure out how to disable the vision part in LM Studio. If I am use chat\server capabilities in the program.

nutspiano

May 3

@danihend Please do not post "fixes" you get from LLMs without verifying them first.

danihend

May 3

@danihend Please do not post "fixes" you get from LLMs without verifying them first.

I didn't post a fix, I just shared what o4-mini's analysis was.

That's why I said "Debug log analysis according to o4-mini:"

Anyone browsing this forum is certainly grown up enough to decide if they believe it or not. Did you take some irreversible actions based on it's advice or what?

nutspiano

May 3

@danihend Your message contains more text than the debug log, you also wrote "how to fix". This is a hallucination, and thus spam. Stop posting LLM hallucinations here. You meant to be helpful, great, thanks for that. But hallucinations are not helpful.

danihend

May 3

@danihend Your message contains more text than the debug log, you also wrote "how to fix". This is a hallucination, and thus spam. Stop posting LLM hallucinations here. You meant to be helpful, great, thanks for that. But hallucinations are not helpful.

I didn't write that, o4 did. I am not "posting hallucinations" . Please learn how to interact with people in a normal way or just keep it to yourself.

shimmyshimmer

Unsloth AI org May 12

Ok guys LMStudio has released an update today! Please install the latest verison of lmstudio and it should now work! I've tried it myself and it works fully functional now! They now properly support llama.cpp's latest update for vision models.

CC: @bedobedo @danihend @Moleculo @nutspiano

nutspiano

May 13

@danihend You didn't write it, an LLM did? And posted it under your name? Very well. It sounds like you are confused about who is responsible for what you post here.

I will repeat what I said: do not post hallucinations you have generated with language models without checking them first. It is spam, and not helpful.

danihend

May 13

Are you being intentionally dumb or what?

nutspiano

May 13

@danihend I cannot force you to be smart, nor nice. But you are posting under your full name, so you should have some incentive to.

danihend

May 28

I have no problem calling out stupidity under my own name.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment