llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma3n'
#3
by
Yutis
- opened
I am using the Google Colab provided in this HF repo (https://colab.research.google.com/#fileId=https%3A//huggingface.co/unsloth/gemma-3n-E2B-it-GGUF.ipynb) but getting the error as below:
./gemma-3n-E2B-it-UD-IQ2_XXS.gguf:β100%
β2.05G/2.05Gβ[00:29<00:00,β54.2MB/s]
llama_model_loader: loaded meta data with 51 key-value pairs and 727 tensors from /root/.cache/huggingface/hub/models--unsloth--gemma-3n-E2B-it-GGUF/snapshots/4a272108ff14d0e1cef82baa8025a60a91307348/./gemma-3n-E2B-it-UD-IQ2_XXS.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = gemma3n
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Gemma-3N-E2B-It
llama_model_loader: - kv 3: general.finetune str = 3n-E2B-it
llama_model_loader: - kv 4: general.basename str = Gemma-3N-E2B-It
llama_model_loader: - kv 5: general.quantized_by str = Unsloth
llama_model_loader: - kv 6: general.size_label str = 4.5B
llama_model_loader: - kv 7: general.license str = gemma
llama_model_loader: - kv 8: general.repo_url str = https://huggingface.co/unsloth
llama_model_loader: - kv 9: general.base_model.count u32 = 1
llama_model_loader: - kv 10: general.base_model.0.name str = Gemma 3n E2B It
llama_model_loader: - kv 11: general.base_model.0.organization str = Google
llama_model_loader: - kv 12: general.base_model.0.repo_url str = https://huggingface.co/google/gemma-3...
llama_model_loader: - kv 13: general.tags arr[str,6] = ["automatic-speech-recognition", "uns...
llama_model_loader: - kv 14: gemma3n.context_length u32 = 32768
llama_model_loader: - kv 15: gemma3n.embedding_length u32 = 2048
llama_model_loader: - kv 16: gemma3n.block_count u32 = 30
llama_model_loader: - kv 17: gemma3n.feed_forward_length arr[i32,30] = [8192, 8192, 8192, 8192, 8192, 8192, ...
llama_model_loader: - kv 18: gemma3n.attention.head_count u32 = 8
llama_model_loader: - kv 19: gemma3n.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 20: gemma3n.attention.key_length u32 = 256
llama_model_loader: - kv 21: gemma3n.attention.value_length u32 = 256
llama_model_loader: - kv 22: gemma3n.rope.freq_base f32 = 1000000.000000
llama_model_loader: - kv 23: gemma3n.attention.sliding_window u32 = 512
llama_model_loader: - kv 24: gemma3n.attention.head_count_kv u32 = 2
llama_model_loader: - kv 25: gemma3n.altup.active_idx u32 = 0
llama_model_loader: - kv 26: gemma3n.altup.num_inputs u32 = 4
llama_model_loader: - kv 27: gemma3n.embedding_length_per_layer_input u32 = 256
llama_model_loader: - kv 28: gemma3n.attention.shared_kv_layers u32 = 10
llama_model_loader: - kv 29: gemma3n.activation_sparsity_scale arr[f32,30] = [1.644854, 1.644854, 1.644854, 1.6448...
llama_model_loader: - kv 30: gemma3n.attention.sliding_window_pattern arr[bool,30] = [true, true, true, true, false, true,...
llama_model_loader: - kv 31: tokenizer.chat_template str = {{ bos_token }}\n{%- if messages[0]['r...
llama_model_loader: - kv 32: tokenizer.ggml.model str = llama
llama_model_loader: - kv 33: tokenizer.ggml.pre str = default
llama_model_loader: - kv 34: tokenizer.ggml.tokens arr[str,262144] = ["<pad>", "<eos>", "<bos>", "<unk>", ...
llama_model_loader: - kv 35: tokenizer.ggml.scores arr[f32,262144] = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv 36: tokenizer.ggml.token_type arr[i32,262144] = [3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv 37: tokenizer.ggml.bos_token_id u32 = 2
llama_model_loader: - kv 38: tokenizer.ggml.eos_token_id u32 = 106
llama_model_loader: - kv 39: tokenizer.ggml.unknown_token_id u32 = 3
llama_model_loader: - kv 40: tokenizer.ggml.padding_token_id u32 = 0
llama_model_loader: - kv 41: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 42: tokenizer.ggml.add_sep_token bool = false
llama_model_loader: - kv 43: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 44: tokenizer.ggml.add_space_prefix bool = false
llama_model_loader: - kv 45: general.quantization_version u32 = 2
llama_model_loader: - kv 46: general.file_type u32 = 19
llama_model_loader: - kv 47: quantize.imatrix.file str = gemma-3n-E2B-it-GGUF/imatrix_unsloth.dat
llama_model_loader: - kv 48: quantize.imatrix.dataset str = unsloth_calibration_gemma-3n-E2B-it.txt
llama_model_loader: - kv 49: quantize.imatrix.entries_count u32 = 400
llama_model_loader: - kv 50: quantize.imatrix.chunks_count u32 = 1326
llama_model_loader: - type f32: 362 tensors
llama_model_loader: - type f16: 93 tensors
llama_model_loader: - type q4_1: 1 tensors
llama_model_loader: - type q2_K: 12 tensors
llama_model_loader: - type q3_K: 30 tensors
llama_model_loader: - type q4_K: 30 tensors
llama_model_loader: - type iq2_xxs: 189 tensors
llama_model_loader: - type iq3_s: 5 tensors
llama_model_loader: - type iq2_s: 5 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = IQ2_XXS - 2.0625 bpw
print_info: file size = 1.91 GiB (3.68 BPW)
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma3n'
llama_model_load_from_file_impl: failed to load model
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/tmp/ipython-input-6-579632107.py in <cell line: 0>()
3 from llama_cpp import Llama
4
----> 5 llm = Llama.from_pretrained(
6 repo_id="unsloth/gemma-3n-E2B-it-GGUF",
7 filename="gemma-3n-E2B-it-UD-IQ2_XXS.gguf",
2 frames
/usr/local/lib/python3.11/dist-packages/llama_cpp/_internals.py in __init__(self, path_model, params, verbose)
54
55 if model is None:
---> 56 raise ValueError(f"Failed to load model from file: {path_model}")
57
58 vocab = llama_cpp.llama_model_get_vocab(model)
ValueError: Failed to load model from file: /root/.cache/huggingface/hub/models--unsloth--gemma-3n-E2B-it-GGUF/snapshots/4a272108ff14d0e1cef82baa8025a60a91307348/./gemma-3n-E2B-it-UD-IQ2_XXS.gguf
I am new to this and not sure how to handle it. What should I do to overcome this? Or it is because the llama_cpp is yet to support gemma3n and I need to wait?
The model I use is gemma-3n-E2B-it-UD-IQ2_XXS.gguf