Ahmadzei's picture
update 1
57bdca5
raw
history blame
966 Bytes
The
[PreTrainedTokenizerFast] class allows for easy instantiation, by accepting the instantiated
tokenizer object as an argument:
thon
from transformers import PreTrainedTokenizerFast
fast_tokenizer = PreTrainedTokenizerFast(tokenizer_object=tokenizer)
This object can now be used with all the methods shared by the 🤗 Transformers tokenizers! Head to the tokenizer
page for more information.
Loading from a JSON file
In order to load a tokenizer from a JSON file, let's first start by saving our tokenizer:
thon
tokenizer.save("tokenizer.json")
The path to which we saved this file can be passed to the [PreTrainedTokenizerFast] initialization
method using the tokenizer_file parameter:
thon
from transformers import PreTrainedTokenizerFast
fast_tokenizer = PreTrainedTokenizerFast(tokenizer_file="tokenizer.json")
This object can now be used with all the methods shared by the 🤗 Transformers tokenizers! Head to the tokenizer
page for more information..