Spaces:

Ahmadzei
/

RAG

Runtime error

update 1

57bdca5 over 1 year ago

720 Bytes

	The
	token indices are under the key input_ids:
	thon

	encoded_sequence = inputs["input_ids"]
	print(encoded_sequence)
	[101, 138, 18696, 155, 1942, 3190, 1144, 1572, 13745, 1104, 159, 9664, 2107, 102]

	Note that the tokenizer automatically adds "special tokens" (if the associated model relies on them) which are special
	IDs the model sometimes uses.
	If we decode the previous sequence of ids,
	thon

	decoded_sequence = tokenizer.decode(encoded_sequence)

	we will see
	thon

	print(decoded_sequence)
	[CLS] A Titan RTX has 24GB of VRAM [SEP]

	because this is the way a [BertModel] is going to expect its inputs.
	L
	labels
	The labels are an optional argument which can be passed in order for the model to compute the loss itself.