Ahmadzei's picture
update 1
57bdca5
raw
history blame
1.24 kB
These labels
should be the expected prediction of the model: it will use the standard loss in order to compute the loss between its
predictions and the expected value (the label).
These labels are different according to the model head, for example:
For sequence classification models, ([BertForSequenceClassification]), the model expects a tensor of dimension
(batch_size) with each value of the batch corresponding to the expected label of the entire sequence.
For token classification models, ([BertForTokenClassification]), the model expects a tensor of dimension
(batch_size, seq_length) with each value corresponding to the expected label of each individual token.
For masked language modeling, ([BertForMaskedLM]), the model expects a tensor of dimension (batch_size,
seq_length) with each value corresponding to the expected label of each individual token: the labels being the token
ID for the masked token, and values to be ignored for the rest (usually -100).
For sequence to sequence tasks, ([BartForConditionalGeneration], [MBartForConditionalGeneration]), the model
expects a tensor of dimension (batch_size, tgt_seq_length) with each value corresponding to the target sequences
associated with each input sequence.