Here's an example using the BERT tokenizer, which is a WordPiece tokenizer: thon from transformers import BertTokenizer tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-cased") sequence = "A Titan RTX has 24GB of VRAM" The tokenizer takes care of splitting the sequence into tokens available in the tokenizer vocabulary. thon tokenized_sequence = tokenizer.tokenize(sequence) The tokens are either words or subwords.