LLM Course

0. Setup

1. Transformer models

2. Using 🤗 Transformers

3. Fine-tuning a pretrained model

4. Sharing models and tokenizers

5. The 🤗 Datasets library

6. The 🤗 Tokenizers library

7. Classical NLP tasks

8. How to ask for help

9. Building and sharing demos

10. Curate high-quality datasets

11. Fine-tune Large Language Models

12. Build Reasoning Models new

Course Events

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

Tokenizers, check!

Great job finishing this chapter!

After this deep dive into tokenizers, you should:

Be able to train a new tokenizer using an old one as a template
Understand how to use offsets to map tokens’ positions to their original span of text
Know the differences between BPE, WordPiece, and Unigram
Be able to mix and match the blocks provided by the 🤗 Tokenizers library to build your own tokenizer
Be able to use that tokenizer inside the 🤗 Transformers library