Embedding from transformers

#6
by tillwenke - opened

Why do you divide by the sum of ALL tokens across all sentences that are embedded in the model card?

outputs = torch.sum(
outputs * inputs["attention_mask"][:, :, None], dim=1) / torch.sum(inputs["attention_mask"])

doesn't do any harm for cos sim but I d rather divide by the number of tokens for each sentence.

Sign up or log in to comment