Here's an example using the BERT | |
tokenizer, which is a WordPiece tokenizer: | |
thon | |
from transformers import BertTokenizer | |
tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-cased") | |
sequence = "A Titan RTX has 24GB of VRAM" | |
The tokenizer takes care of splitting the sequence into tokens available in the tokenizer vocabulary. | |
thon | |
tokenized_sequence = tokenizer.tokenize(sequence) | |
The tokens are either words or subwords. |