Model does not load

#1
by ekuznets - opened

It does not load with current transformers.

The problem is that there are numerous weight name mismatches. E.g. transformers want language_model.model.layers.0.feed_forward.experts.gate_up_proj, but the model actually contains language_model.model.layers.0.feed_forward.experts.gate_up_proj.weight.

Not sure how this could have happened. I see that gate_up_proj is a nn.Parameter in llama4: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama4/modeling_llama4.py#L55 (in several other models, modules with this name are nn.Linear, and, in those, weight names would end with "gate_up_proj.weight".)

Sign up or log in to comment