File size: 436 Bytes
57bdca5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
Here's an example using the BERT tokenizer, which is a WordPiece tokenizer: thon from transformers import BertTokenizer tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-cased") sequence = "A Titan RTX has 24GB of VRAM" The tokenizer takes care of splitting the sequence into tokens available in the tokenizer vocabulary. thon tokenized_sequence = tokenizer.tokenize(sequence) The tokens are either words or subwords. |