Why does using the same fastconformer_hybrid_tdt_ctc_bpe.yaml config to fine-tune pre-train model result in a "mismatch" error?
[RuntimeError: Error(s) in loading state_dict for EncDecRNNTBPEModel:
size mismatch for encoder.pre_encode.out.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
size mismatch for encoder.pre_encode.out.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for encoder.layers.0.norm_feed_forward1.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]).](RuntimeError: Error(s) in loading state_dict for EncDecRNNTBPEModel:
size mismatch for encoder.pre_encode.out.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([512, 4096]).
size mismatch for encoder.pre_encode.out.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for encoder.layers.0.norm_feed_forward1.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for encoder.layers.0.norm_feed_forward1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for encoder.layers.0.feed_forward1.linear1.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([2048, 512]).)
It seems you’re initializing from a different model configuration—your current config has d_model set to 512, while the checkpoint was initialized with d_model as 1024.