File size: 436 Bytes
57bdca5
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Here's an example using the BERT
tokenizer, which is a WordPiece tokenizer:
thon

from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-cased")
sequence = "A Titan RTX has 24GB of VRAM"

The tokenizer takes care of splitting the sequence into tokens available in the tokenizer vocabulary.
thon

tokenized_sequence = tokenizer.tokenize(sequence)

The tokens are either words or subwords.