pszemraj's picture
Update README.md
e28096f verified
|
raw
history blame
542 Bytes
metadata
library_name: transformers
tags:
  - tokenizer
  - mlm
license: mit

claude tokenizer: mlm

A variant of Xenova/claude-tokenizer with some small changes to support usage as an MLM tokenizer.

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('pszemraj/claude-tokenizer-mlm')

text = "Hello, this is a test input."
ids = tokenizer(text)
print(tokenizer.decode(ids['input_ids'], skip_special_tokens=False))
# <bos>Hello, this is a test input.<EOT>