metadata
library_name: transformers
tags:
- transformers.js
- tokenizers
license: mit
Claude Tokenizer
A 🤗-compatible version of the Claude tokenizer (adapted from anthropics/anthropic-sdk-python). This means it can be used with Hugging Face libraries including Transformers, Tokenizers, and Transformers.js.
Usage (Transformers.js)
If you haven't already, you can install the Transformers.js JavaScript library from NPM using:
npm i @huggingface/transformers
Example: Tokenize text using Claude tokenizer
import { AutoTokenizer } from '@huggingface/transformers';
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/claude-tokenizer');
const tokens = tokenizer.encode('hello world'); // [9381, 2253]
Example usage:
Transformers/Tokenizers
from transformers import GPT2TokenizerFast
tokenizer = GPT2TokenizerFast.from_pretrained('Xenova/claude-tokenizer')
assert tokenizer.encode('hello world') == [9381, 2253]
Transformers.js
import { AutoTokenizer } from '@huggingface/transformers';
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/claude-tokenizer');
const tokens = tokenizer.encode('hello world'); // [9381, 2253]