claude-tokenizer / README.md
whitphx's picture
whitphx HF Staff
Add/update the quantized ONNX model files and README.md for Transformers.js v3
a6ed142 verified
|
raw
history blame
1.52 kB
metadata
library_name: transformers
tags:
  - transformers.js
  - tokenizers
license: mit

Claude Tokenizer

A 🤗-compatible version of the Claude tokenizer (adapted from anthropics/anthropic-sdk-python). This means it can be used with Hugging Face libraries including Transformers, Tokenizers, and Transformers.js.

Usage (Transformers.js)

If you haven't already, you can install the Transformers.js JavaScript library from NPM using:

npm i @huggingface/transformers

Example: Tokenize text using Claude tokenizer

import { AutoTokenizer } from '@huggingface/transformers';

const tokenizer = await AutoTokenizer.from_pretrained('Xenova/claude-tokenizer');
const tokens = tokenizer.encode('hello world'); // [9381, 2253]

Example usage:

Transformers/Tokenizers

from transformers import GPT2TokenizerFast

tokenizer = GPT2TokenizerFast.from_pretrained('Xenova/claude-tokenizer')
assert tokenizer.encode('hello world') == [9381, 2253]

Transformers.js

import { AutoTokenizer } from '@huggingface/transformers';

const tokenizer = await AutoTokenizer.from_pretrained('Xenova/claude-tokenizer');
const tokens = tokenizer.encode('hello world'); // [9381, 2253]