Error(s) in loading state_dict for OFAModel: size mismatch for decoder.image_position_idx
#1
by
YongchengYAO
- opened
What's working:
from transformers import OFATokenizer, OFAModel
pretrained_base = "PanaceaAI/BiomedGPT-Base-Pretrained"
_tokenizer = OFATokenizer.from_pretrained(pretrained_base)
_model = OFAModel.from_pretrained(pretrained_base)
What fails:
from transformers import OFATokenizer, OFAModel
pretrained_instruct = "PanaceaAI/instruct-biomedgpt-base"
_tokenizer = OFATokenizer.from_pretrained(pretrained_instruct)
_model = OFAModel.from_pretrained(pretrained_instruct)
Error:
Error(s) in loading state_dict for OFAModel:
size mismatch for decoder.image_position_idx: copying a param with shape torch.Size([1026]) from checkpoint, the shape in current model is torch.Size([1025]).
The issue was solved as mentioned in the Github issues: https://github.com/taokz/BiomedGPT/issues/39#issuecomment-2374711794
The details are as follows:
The mismatching issue occurred during development when I updated the model with an additional positional embedding to match embed_positions.
You can resolve this by using model = OFAModel.from_pretrained(f"./{model_name}", ignore_mismatched_sizes=True) to load the model, which will ignore the size mismatch (not sure how it influence the performance).
Alternatively, you can modify the line:
image_position_idx = torch.cat([image_position_idx, torch.tensor([1024] * 768)])
to [1024] * 769 in ./models/ofa/unify_transformer.py.
PanaceaAI
changed discussion status to
closed