File size: 542 Bytes
c357007
 
e28096f
 
 
 
c357007
 
e28096f
c357007
e28096f
c357007
e28096f
 
 
c357007
e28096f
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
---
library_name: transformers
tags:
- tokenizer
- mlm
license: mit
---

# claude tokenizer: mlm

A variant of [Xenova/claude-tokenizer](https://huggingface.co/Xenova/claude-tokenizer) with some small changes to support usage as an MLM tokenizer.

```py
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('pszemraj/claude-tokenizer-mlm')

text = "Hello, this is a test input."
ids = tokenizer(text)
print(tokenizer.decode(ids['input_ids'], skip_special_tokens=False))
# <bos>Hello, this is a test input.<EOT>
```