The token indices are under the key input_ids: thon encoded_sequence = inputs["input_ids"] print(encoded_sequence) [101, 138, 18696, 155, 1942, 3190, 1144, 1572, 13745, 1104, 159, 9664, 2107, 102] Note that the tokenizer automatically adds "special tokens" (if the associated model relies on them) which are special IDs the model sometimes uses. If we decode the previous sequence of ids, thon decoded_sequence = tokenizer.decode(encoded_sequence) we will see thon print(decoded_sequence) [CLS] A Titan RTX has 24GB of VRAM [SEP] because this is the way a [BertModel] is going to expect its inputs. L labels The labels are an optional argument which can be passed in order for the model to compute the loss itself.