Getting tensor 'blk.0.ffn_down_exps.weight' has invalid ggml type 39 (NONE) error when trying to load the model in llama.cpp

by KernelDebugger - opened 18 days ago

18 days ago

Hello. Trying to load the model in llama.cpp (cli and server) and getting this error:

srv load_model: loading model '../../../gpt-oss-20b-F16.gguf'
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4070 Ti SUPER) - 15429 MiB free
gguf_init_from_file_impl: tensor 'blk.0.ffn_down_exps.weight' has invalid ggml type 39 (NONE)
gguf_init_from_file_impl: failed to read tensor info
llama_model_load: error loading model: llama_model_loader: failed to load model from ../../../gpt-oss-20b-F16.gguf
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '../../../gpt-oss-20b-F16.gguf'
srv load_model: failed to load model, '../../../gpt-oss-20b-F16.gguf'
srv operator(): operator(): cleaning up before exit...

JamesMowery

18 days ago

•

edited 18 days ago

Also seeing this. Although in the other thread it was mentioned that a new version is coming with fixes regarding chat template. Hopefully that fix is also fixing this? But if not, then this needs attention as well!

KernelDebugger

18 days ago

•

edited 18 days ago

I think I’ve downloaded the fixed version. Exactly under the link "This is the new MXFP4_MOE quant renamed to F16 with our chat template fixes! Use GGUFs here."
And it shows this error.

JamesMowery

18 days ago

•

edited 18 days ago

Referring to this comment: https://huggingface.co/unsloth/gpt-oss-20b-GGUF/discussions/2#68927da38491c63d06a29dd6

This was posted 20 minutes ago, and the models were uploaded 1 hour ago and the readme was updated like 40 or so minutes ago, so I'm assuming another upload is coming. But I'm not 100% sure.

But, as mentioned, I'm also seeing this error. So I'm just waiting for fixes.