|
--- |
|
library_name: transformers |
|
tags: |
|
- tokenizer |
|
- mlm |
|
license: mit |
|
--- |
|
|
|
# claude tokenizer: mlm |
|
|
|
A variant of [Xenova/claude-tokenizer](https://huggingface.co/Xenova/claude-tokenizer) with some small changes to support usage as an MLM tokenizer. |
|
|
|
```py |
|
from transformers import AutoTokenizer |
|
tokenizer = AutoTokenizer.from_pretrained('pszemraj/claude-tokenizer-mlm') |
|
|
|
text = "Hello, this is a test input." |
|
ids = tokenizer(text) |
|
print(tokenizer.decode(ids['input_ids'], skip_special_tokens=False)) |
|
# <bos>Hello, this is a test input.<EOT> |
|
``` |