Error(s) in loading state_dict for OFAModel: size mismatch for decoder.image_position_idx

#1
by YongchengYAO - opened

What's working:

from transformers import OFATokenizer, OFAModel
pretrained_base = "PanaceaAI/BiomedGPT-Base-Pretrained"
_tokenizer = OFATokenizer.from_pretrained(pretrained_base)
_model = OFAModel.from_pretrained(pretrained_base)

What fails:

from transformers import OFATokenizer, OFAModel
pretrained_instruct = "PanaceaAI/instruct-biomedgpt-base"
_tokenizer = OFATokenizer.from_pretrained(pretrained_instruct)
_model = OFAModel.from_pretrained(pretrained_instruct)

Error:

Error(s) in loading state_dict for OFAModel:
    size mismatch for decoder.image_position_idx: copying a param with shape torch.Size([1026]) from checkpoint, the shape in current model is torch.Size([1025]).

The issue was solved as mentioned in the Github issues: https://github.com/taokz/BiomedGPT/issues/39#issuecomment-2374711794

The details are as follows:

The mismatching issue occurred during development when I updated the model with an additional positional embedding to match embed_positions.

You can resolve this by using model = OFAModel.from_pretrained(f"./{model_name}", ignore_mismatched_sizes=True) to load the model, which will ignore the size mismatch (not sure how it influence the performance).

Alternatively, you can modify the line:

image_position_idx = torch.cat([image_position_idx, torch.tensor([1024] * 768)])
to [1024] * 769 in ./models/ofa/unify_transformer.py.

PanaceaAI changed discussion status to closed

Sign up or log in to comment