🥲 Failed to load the modeL

#7
by danihend - opened

Latest beta LM Studio and latest Cuda 12 Runtime(Windows).

error: 🥲 Failed to load the model
Error loading model.
(Exit code: 18446744072635812000). Unknown error. Try a different model and/or config.

It is linked to CPU offloading. Once you reduce the context to fit within VRAM, it loads no issue and runs, but CPU offloading is broken for me.

I had a similar issue with Google's QAT models in that they would load but when processing the prompt, they would just get stuck after a certain % and not process the rest, maybe also due to issues processing through the CPU/RAM?

Is there even any point in using these models then, if I have to choose a much lower quant to fit within VRAM? You probably end up with worse intelligence for the same size no?

That was Q3_K_XL I think. I deleted it and tried IQ2_XXS, and even though it comfortably fits in VRAM (RTX3080 10GB), it fails right at the point that it seems fully loaded.

Debug log analysis according to o4-mini:
The culprit here is buried in these two adjacent blocks:

clip_ctx: CLIP using CUDA0 backend
clip_model_loader: model name:   Gemma-3-12B-It-Qat
…
load_hparams: vision_encoder:     1
…
load_hparams: model size:         134217949.38 MiB
…
llama.cpp abort:2743: GGML_ASSERT(ggml_can_mul_mat(a, b)) failed

What’s happening is that LM Studio is trying to spin up a CLIP-style vision encoder in addition to your text LLM, and—by misconfiguration—it’s pointed at your Gemma-3-12B-It-Qat .gguf file instead of an actual CLIP model. You can see that the loader thinks your Gemma model has a “vision_encoder” component and reports an absurd size (~128 TB!), which obviously isn’t right. When llama.cpp then tries to do a matrix multiply with tensors whose dimensions don’t line up, it trips the ggml_can_mul_mat(a, b) assertion and aborts.


How to fix

  1. Disable the vision encoder if you don’t need any image-to-text capability.
    In your LM Studio project settings, turn off or remove the CLIP/Vision encoder entry so that only the LLM is loaded.

  2. Point to a real CLIP model if you do intend to use a multimodal (vision+LLM) setup.
    Download an appropriate GGUF CLIP ViT file (e.g. clip-vit-base-patch32.gguf) and in your project’s “Vision Encoder” slot, reference that file instead of the Gemma LLM.

Once the CLIP loader is either disabled or correctly configured, the mismatched-shapes error will go away and your Gemma model will load normally.

Same with 4b model. Is it just me??

Unsloth AI org

Apparently according to some people, you'll need to wait for the latest LM Studio update. Do you know if it runs fine in llama.cpp?

It fails when using both F16 and F32 mmproj with the latest llama-mtmd-cli in llama.cpp releases(b5216). It works when switching to https://huggingface.co/lmstudio-community/gemma-3-12b-it-GGUF/blob/main/mmproj-model-f16.gguf.

How to fix

  1. Disable the vision encoder if you don’t need any image-to-text capability.
    In your LM Studio project settings, turn off or remove the CLIP/Vision encoder entry so that only the LLM is loaded.

  2. Point to a real CLIP model if you do intend to use a multimodal (vision+LLM) setup.
    Download an appropriate GGUF CLIP ViT file (e.g. clip-vit-base-patch32.gguf) and in your project’s “Vision Encoder” slot, reference that file instead of the Gemma LLM.

Once the CLIP loader is either disabled or correctly configured, the mismatched-shapes error will go away and your Gemma model will load normally.

I cant figure out how to disable the vision part in LM Studio. If I am use chat\server capabilities in the program.

@danihend Please do not post "fixes" you get from LLMs without verifying them first.

@danihend Please do not post "fixes" you get from LLMs without verifying them first.

I didn't post a fix, I just shared what o4-mini's analysis was.

That's why I said "Debug log analysis according to o4-mini:"

Anyone browsing this forum is certainly grown up enough to decide if they believe it or not. Did you take some irreversible actions based on it's advice or what?

@danihend Your message contains more text than the debug log, you also wrote "how to fix". This is a hallucination, and thus spam. Stop posting LLM hallucinations here. You meant to be helpful, great, thanks for that. But hallucinations are not helpful.

@danihend Your message contains more text than the debug log, you also wrote "how to fix". This is a hallucination, and thus spam. Stop posting LLM hallucinations here. You meant to be helpful, great, thanks for that. But hallucinations are not helpful.

I didn't write that, o4 did. I am not "posting hallucinations" . Please learn how to interact with people in a normal way or just keep it to yourself.

Unsloth AI org

Ok guys LMStudio has released an update today! Please install the latest verison of lmstudio and it should now work! I've tried it myself and it works fully functional now! They now properly support llama.cpp's latest update for vision models.

CC: @bedobedo @danihend @Moleculo @nutspiano

@danihend You didn't write it, an LLM did? And posted it under your name? Very well. It sounds like you are confused about who is responsible for what you post here.

I will repeat what I said: do not post hallucinations you have generated with language models without checking them first. It is spam, and not helpful.

Are you being intentionally dumb or what?

@danihend I cannot force you to be smart, nor nice. But you are posting under your full name, so you should have some incentive to.

I have no problem calling out stupidity under my own name.

Sign up or log in to comment