PEFT
Safetensors
Sinhala

Getting this run time error on Colab

#1
by elihoole - opened

RuntimeError Traceback (most recent call last)
/tmp/ipython-input-2302535289.py in <cell line: 0>()
6
7 base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B")
----> 8 model = PeftModel.from_pretrained(base_model, "polyglots/SinLlama_v01")

3 frames
/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict, assign)
2622
2623 if len(error_msgs) > 0:
-> 2624 raise RuntimeError(
2625 "Error(s) in loading state_dict for {}:\n\t{}".format(
2626 self.class.name, "\n\t".join(error_msgs)

RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.model.embed_tokens.original_module.weight: copying a param with shape torch.Size([139336, 4096]) from checkpoint, the shape in current model is torch.Size([128256, 4096]).
size mismatch for base_model.model.model.embed_tokens.modules_to_save.default.weight: copying a param with shape torch.Size([139336, 4096]) from checkpoint, the shape in current model is torch.Size([128256, 4096]).
size mismatch for base_model.model.lm_head.original_module.weight: copying a param with shape torch.Size([139336, 4096]) from checkpoint, the shape in current model is torch.Size([128256, 4096]).
size mismatch for base_model.model.lm_head.modules_to_save.default.weight: copying a param with shape torch.Size([139336, 4096]) from checkpoint, the shape in current model is torch.Size([128256, 4096]).

This is the full code to load the model for further experiments

# Install dependencies
!pip install unsloth # @ git+https://github.com/unslothai/unsloth.git
!pip install datasets==2.21.0
!pip install pandas==2.1.4

# Import dependencies
from unsloth import FastLanguageModel, is_bfloat16_supported
from transformers import TextStreamer, AutoTokenizer
import torch
from datasets import load_dataset, DatasetDict, concatenate_datasets, Dataset
from collections import Counter, defaultdict
import os
import sys

from trl import SFTTrainer
from transformers import TrainingArguments, TextStreamer
import pandas as pd

# Load the base model
model_config = {"model_name": "unsloth/llama-3-8b", "load_in_4bit": False}
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = False # Use 4bit quantization to reduce memory usage. Can be False.
model_name = "polyglots/SinLlama_v01" # Change the model name

# Load the model
model, _ = FastLanguageModel.from_pretrained(
    model_name = model_name,
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    resize_model_vocab=139336,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

# Load our extended tokenizer
tokenizer = AutoTokenizer.from_pretrained("polyglots/Extended-Sinhala-LLaMA")
model.resize_token_embeddings(len(tokenizer))
Polyglots FYP org

The error occurs because the checkpoint you’re trying to load (polyglots/SinLlama_v01) was trained with an extended vocabulary of 139,336 tokens, while the base model (meta-llama/Meta-Llama-3-8B) has only 128,256 tokens. When you try to load it directly with PeftModel.from_pretrained, the embedding layers and the LM head do not match in size, resulting in the size mismatch error.

The solution is to resize the model’s token embeddings to match the checkpoint’s vocabulary before loading the PEFT model. In the code you shared, this is handled by:

model, _ = FastLanguageModel.from_pretrained(
    model_name = model_name,
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    resize_model_vocab=139336,  # resize embeddings to match checkpoint
)
tokenizer = AutoTokenizer.from_pretrained("polyglots/Extended-Sinhala-LLaMA")
model.resize_token_embeddings(len(tokenizer))  # align embeddings with tokenizer

This ensures the model’s embeddings and LM head are the correct size, avoiding the mismatch.

Sign up or log in to comment