Huge perplexity value
#20
by
zhuqiang
- opened
Hi all, I just noticed that gemma-3n gives huge perplexity value even when a very simple case is given.
Reproduce
import torch
from transformers import AutoProcessor, AutoModelForImageTextToText
model = AutoModelForImageTextToText.from_pretrained(
"models/gemma-3n-E2B-it",
torch_dtype=torch.bfloat16,
attn_implementation="flash_attention_2",
# attn_implementation="eager",
device_map='cuda',
).eval()
processor = AutoProcessor.from_pretrained("models/gemma-3n-E2B-it", )
messages = [
{
"role": "system",
"content": [{"type": "text", "text": "You are a helpful assistant."}]
},
{
"role": "user",
"content": [
{"type": "text", "text": "Hi"}
]
},
{
"role": "assistant",
"content": [
{"type": "text", "text": "How are you?"}
]
}
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
print(text)
encodings = processor(text=text, images=None, videos=None, padding=False, return_tensors="pt")
input_ids = encodings.input_ids.to('cuda')
target_ids = input_ids.clone()
trg_len = -2
target_ids[:, :trg_len] = -100
with torch.no_grad():
outputs = model(input_ids, labels=target_ids)
nll = outputs.loss
ppl = torch.exp(nll)
print(ppl)
Ouptput
tensor(43704.4180, device='cuda:0')
packages:
transformers==4.56.2
torch==2.8.0
Hi
@zhuqiang
I believe the issue is with your label masking. Your code target_ids[:, :-2] = -100 incorrectly calculates perplexity on only the last two tokens of the sequence leading to the massive score. I think to fix this, you can mask the entire input prompt and calculate loss only on the target response tokens you want to evaluate. find the token length of your prompt (prompt_len), and then apply the mask correctly like this target_ids[:, :prompt_len] = -100.
Thank you