Embedding from transformers
#6
by
tillwenke
- opened
Why do you divide by the sum of ALL tokens across all sentences that are embedded in the model card?
outputs = torch.sum(
outputs * inputs["attention_mask"][:, :, None], dim=1) / torch.sum(inputs["attention_mask"])
doesn't do any harm for cos sim but I d rather divide by the number of tokens for each sentence.