|
<!--Copyright 2020 The HuggingFace Team. All rights reserved. |
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with |
|
the License. You may obtain a copy of the License at |
|
|
|
http: |
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on |
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the |
|
specific language governing permissions and limitations under the License. |
|
--> |
|
|
|
# 🤗 Transformers |
|
|
|
|
|
Estado da Arte para Aprendizado de Máquina em PyTorch, TensorFlow e JAX. |
|
O 🤗 Transformers disponibiliza APIs para facilmente baixar e treinar modelos pré-treinados de última geração. |
|
O uso de modelos pré-treinados pode diminuir os seus custos de computação, a sua pegada de carbono, além de economizar o |
|
tempo necessário para se treinar um modelo do zero. Os modelos podem ser usados para diversas tarefas: |
|
|
|
|
|
|
|
|
|
|
|
documentos escaneados, classificação de vídeo, perguntas e respostas visuais. |
|
|
|
Nossa biblioteca aceita integração contínua entre três das bibliotecas mais populares de aprendizado profundo: |
|
Our library supports seamless integration between three of the most popular deep learning libraries: |
|
[PyTorch](https: |
|
Treine seu modelo em três linhas de código em um framework, e carregue-o para execução em outro. |
|
|
|
Cada arquitetura 🤗 Transformers é definida em um módulo individual do Python, para que seja facilmente customizável para pesquisa e experimentos. |
|
|
|
## Se você estiver procurando suporte do time da Hugging Face, acesse |
|
|
|
<a target="_blank" href="https://huggingface.co/support"> |
|
<img alt="HuggingFace Expert Acceleration Program" src="https://huggingface.co/front/thumbnails/support.png" style="width: 100%; max-width: 600px; border: 1px solid #eee; border-radius: 4px; box-shadow: 0 1px 2px 0 rgba(0, 0, 0, 0.05);"></img> |
|
</a> |
|
|
|
## Conteúdo |
|
|
|
A documentação é dividida em cinco partes: |
|
- **INÍCIO** contém um tour rápido de instalação e instruções para te dar um empurrão inicial com os 🤗 Transformers. |
|
- **TUTORIAIS** são perfeitos para começar a aprender sobre a nossa biblioteca. Essa seção irá te ajudar a desenvolver |
|
habilidades básicas necessárias para usar o 🤗 Transformers. |
|
- **GUIAS PRÁTICOS** irão te mostrar como alcançar um certo objetivo, como o fine-tuning de um modelo pré-treinado |
|
para modelamento de idioma, ou como criar um cabeçalho personalizado para um modelo. |
|
- **GUIAS CONCEITUAIS** te darão mais discussões e explicações dos conceitos fundamentais e idéias por trás dos modelos, |
|
tarefas e da filosofia de design por trás do 🤗 Transformers. |
|
- **API** descreve o funcionamento de cada classe e função, agrupada em: |
|
|
|
- **CLASSES PRINCIPAIS** para as classes que expõe as APIs importantes da biblioteca. |
|
- **MODELOS** para as classes e funções relacionadas à cada modelo implementado na biblioteca. |
|
- **AUXILIARES INTERNOS** para as classes e funções usadas internamente. |
|
|
|
Atualmente a biblioteca contém implementações do PyTorch, TensorFlow e JAX, pesos para modelos pré-treinados e scripts de uso e conversão de utilidades para os seguintes modelos: |
|
|
|
### Modelos atuais |
|
|
|
<!--This list is updated automatically from the README with _make fix-copies_. Do not update manually! --> |
|
|
|
1. **[ALBERT](model_doc/albert)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https: |
|
1. **[BART](model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https: |
|
1. **[BARThez](model_doc/barthez)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https: |
|
1. **[BARTpho](model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https: |
|
1. **[BEiT](model_doc/beit)** (from Microsoft) released with the paper [BEiT: BERT Pre-Training of Image Transformers](https: |
|
1. **[BERT](model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https: |
|
1. **[BERTweet](model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https: |
|
1. **[BERT For Sequence Generation](model_doc/bert-generation)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https: |
|
1. **[BigBird-RoBERTa](model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https: |
|
1. **[BigBird-Pegasus](model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https: |
|
1. **[Blenderbot](model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https: |
|
1. **[BlenderbotSmall](model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https: |
|
1. **[BORT](model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https: |
|
1. **[ByT5](model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https: |
|
1. **[CamemBERT](model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https: |
|
1. **[CANINE](model_doc/canine)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https: |
|
1. **[ConvNeXT](model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https: |
|
1. **[ConvNeXTV2](model_doc/convnextv2)** (from Facebook AI) released with the paper [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https: |
|
1. **[CLIP](model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https: |
|
1. **[ConvBERT](model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https: |
|
1. **[CPM](model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https: |
|
1. **[CTRL](model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https: |
|
1. **[Data2Vec](model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https: |
|
1. **[DeBERTa](model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https: |
|
1. **[DeBERTa-v2](model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https: |
|
1. **[Decision Transformer](model_doc/decision_transformer)** (from Berkeley/Facebook/Google) released with the paper [Decision Transformer: Reinforcement Learning via Sequence Modeling](https: |
|
1. **[DiT](model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https: |
|
1. **[DeiT](model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https: |
|
1. **[DETR](model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https: |
|
1. **[DialoGPT](model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https: |
|
1. **[DistilBERT](model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https: |
|
1. **[DPR](model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval for Open-Domain Question Answering](https: |
|
1. **[DPT](master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https: |
|
1. **[EfficientNet](model_doc/efficientnet)** (from Google Research) released with the paper [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https: |
|
1. **[EncoderDecoder](model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https: |
|
1. **[ELECTRA](model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https: |
|
1. **[FlauBERT](model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https: |
|
1. **[FNet](model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https: |
|
1. **[Funnel Transformer](model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https: |
|
1. **[GLPN](model_doc/glpn)** (from KAIST) released with the paper [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https: |
|
1. **[GPT](model_doc/openai-gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https: |
|
1. **[GPT-2](model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https: |
|
1. **[GPT-J](model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https: |
|
1. **[GPT Neo](model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https: |
|
1. **[GPTSAN-japanese](model_doc/gptsan-japanese)** released in the repository [tanreinama/GPTSAN](https: |
|
1. **[Hubert](model_doc/hubert)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https: |
|
1. **[I-BERT](model_doc/ibert)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https: |
|
1. **[ImageGPT](model_doc/imagegpt)** (from OpenAI) released with the paper [Generative Pretraining from Pixels](https: |
|
1. **[LayoutLM](model_doc/layoutlm)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https: |
|
1. **[LayoutLMv2](model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https: |
|
1. **[LayoutXLM](model_doc/layoutxlm)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https: |
|
1. **[LED](model_doc/led)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https: |
|
1. **[Longformer](model_doc/longformer)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https: |
|
1. **[LUKE](model_doc/luke)** (from Studio Ousia) released with the paper [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https: |
|
1. **[mLUKE](model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https: |
|
1. **[LXMERT](model_doc/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https: |
|
1. **[M2M100](model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https: |
|
1. **[MarianMT](model_doc/marian)** Machine translation models trained using [OPUS](http: |
|
1. **[Mask2Former](model_doc/mask2former)** (from FAIR and UIUC) released with the paper [Masked-attention Mask Transformer for Universal Image Segmentation](https: |
|
1. **[MaskFormer](model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https: |
|
1. **[MBart](model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https: |
|
1. **[MBart-50](model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https: |
|
1. **[Megatron-BERT](model_doc/megatron-bert)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https: |
|
1. **[Megatron-GPT2](model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https: |
|
1. **[MPNet](model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https: |
|
1. **[MT5](model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https: |
|
1. **[Nyströmformer](model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https: |
|
1. **[OneFormer](model_doc/oneformer)** (from SHI Labs) released with the paper [OneFormer: One Transformer to Rule Universal Image Segmentation](https: |
|
1. **[Pegasus](model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https: |
|
1. **[Perceiver IO](model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https: |
|
1. **[PhoBERT](model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https: |
|
1. **[PLBart](model_doc/plbart)** (from UCLA NLP) released with the paper [Unified Pre-training for Program Understanding and Generation](https: |
|
1. **[PoolFormer](model_doc/poolformer)** (from Sea AI Labs) released with the paper [MetaFormer is Actually What You Need for Vision](https: |
|
1. **[ProphetNet](model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https: |
|
1. **[QDQBert](model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https: |
|
1. **[REALM](model_doc/realm.html)** (from Google Research) released with the paper [REALM: Retrieval-Augmented Language Model Pre-Training](https: |
|
1. **[Reformer](model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https: |
|
1. **[RemBERT](model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https: |
|
1. **[RegNet](model_doc/regnet)** (from META Platforms) released with the paper [Designing Network Design Space](https: |
|
1. **[ResNet](model_doc/resnet)** (from Microsoft Research) released with the paper [Deep Residual Learning for Image Recognition](https: |
|
1. **[RoBERTa](model_doc/roberta)** (from Facebook), released together with the paper [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https: |
|
1. **[RoFormer](model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper [RoFormer: Enhanced Transformer with Rotary Position Embedding](https: |
|
1. **[SegFormer](model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https: |
|
1. **[SEW](model_doc/sew)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https: |
|
1. **[SEW-D](model_doc/sew_d)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https: |
|
1. **[SpeechToTextTransformer](model_doc/speech_to_text)** (from Facebook), released together with the paper [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https: |
|
1. **[SpeechToTextTransformer2](model_doc/speech_to_text_2)** (from Facebook), released together with the paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https: |
|
1. **[Splinter](model_doc/splinter)** (from Tel Aviv University), released together with the paper [Few-Shot Question Answering by Pretraining Span Selection](https: |
|
1. **[SqueezeBert](model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https: |
|
1. **[Swin Transformer](model_doc/swin)** (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https: |
|
1. **[T5](model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https: |
|
1. **[T5v1.1](model_doc/t5v1.1)** (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https: |
|
1. **[TAPAS](model_doc/tapas)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https: |
|
1. **[TAPEX](model_doc/tapex)** (from Microsoft Research) released with the paper [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https: |
|
1. **[Transformer-XL](model_doc/transfo-xl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https: |
|
1. **[TrOCR](model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https: |
|
1. **[UniSpeech](model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https: |
|
1. **[UniSpeechSat](model_doc/unispeech-sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https: |
|
1. **[VAN](model_doc/van)** (from Tsinghua University and Nankai University) released with the paper [Visual Attention Network](https: |
|
1. **[ViLT](model_doc/vilt)** (from NAVER AI Lab/Kakao Enterprise/Kakao Brain) released with the paper [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https: |
|
1. **[Vision Transformer (ViT)](model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https: |
|
1. **[ViTMAE](model_doc/vit_mae)** (from Meta AI) released with the paper [Masked Autoencoders Are Scalable Vision Learners](https: |
|
1. **[VisualBERT](model_doc/visual_bert)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https: |
|
1. **[WavLM](model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https: |
|
1. **[Wav2Vec2](model_doc/wav2vec2)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https: |
|
1. **[Wav2Vec2Phoneme](model_doc/wav2vec2_phoneme)** (from Facebook AI) released with the paper [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition](https: |
|
1. **[XGLM](model_doc/xglm)** (From Facebook AI) released with the paper [Few-shot Learning with Multilingual Language Models](https: |
|
1. **[XLM](model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https: |
|
1. **[XLM-ProphetNet](model_doc/xlm-prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https: |
|
1. **[XLM-RoBERTa](model_doc/xlm-roberta)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https: |
|
1. **[XLM-RoBERTa-XL](model_doc/xlm-roberta-xl)** (from Facebook AI), released together with the paper [Larger-Scale Transformers for Multilingual Masked Language Modeling](https: |
|
1. **[XLNet](model_doc/xlnet)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https: |
|
1. **[XLSR-Wav2Vec2](model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https: |
|
1. **[XLS-R](model_doc/xls_r)** (from Facebook AI) released with the paper [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https: |
|
1. **[YOSO](model_doc/yoso)** (from the University of Wisconsin - Madison) released with the paper [You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling](https: |
|
|
|
|
|
### Frameworks aceitos |
|
|
|
A tabela abaixo representa a lista de suporte na biblioteca para cada um dos seguintes modelos, caso tenham um tokenizer |
|
do Python (chamado de "slow"), ou um tokenizer construído em cima da biblioteca 🤗 Tokenizers (chamado de "fast"). Além |
|
disso, são diferenciados pelo suporte em diferentes frameworks: JAX (por meio do Flax); PyTorch; e/ou Tensorflow. |
|
|
|
<!--This table is updated automatically from the auto modules with _make fix-copies_. Do not update manually!--> |
|
|
|
| Model | Tokenizer slow | Tokenizer fast | PyTorch support | TensorFlow support | Flax Support | |
|
|:---------------------------:|:--------------:|:--------------:|:---------------:|:------------------:|:------------:| |
|
| ALBERT | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| BART | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| BEiT | ❌ | ❌ | ✅ | ❌ | ✅ | |
|
| BERT | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| Bert Generation | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| BigBird | ✅ | ✅ | ✅ | ❌ | ✅ | |
|
| BigBirdPegasus | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| Blenderbot | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| BlenderbotSmall | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| CamemBERT | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| Canine | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| CLIP | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| ConvBERT | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| ConvNext | ❌ | ❌ | ✅ | ✅ | ❌ | |
|
| CTRL | ✅ | ❌ | ✅ | ✅ | ❌ | |
|
| Data2VecAudio | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| Data2VecText | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| Data2VecVision | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| DeBERTa | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| DeBERTa-v2 | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| Decision Transformer | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| DeiT | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| DETR | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| DistilBERT | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| DPR | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| DPT | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| ELECTRA | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| Encoder decoder | ❌ | ❌ | ✅ | ✅ | ✅ | |
|
| FairSeq Machine-Translation | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| FlauBERT | ✅ | ❌ | ✅ | ✅ | ❌ | |
|
| FNet | ✅ | ✅ | ✅ | ❌ | ❌ | |
|
| Funnel Transformer | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| GLPN | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| GPT Neo | ❌ | ❌ | ✅ | ❌ | ✅ | |
|
| GPT-J | ❌ | ❌ | ✅ | ✅ | ✅ | |
|
| Hubert | ❌ | ❌ | ✅ | ✅ | ❌ | |
|
| I-BERT | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| ImageGPT | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| LayoutLM | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| LayoutLMv2 | ✅ | ✅ | ✅ | ❌ | ❌ | |
|
| LED | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| Longformer | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| LUKE | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| LXMERT | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| M2M100 | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| Marian | ✅ | ❌ | ✅ | ✅ | ✅ | |
|
| MaskFormer | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| mBART | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| MegatronBert | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| MobileBERT | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| MPNet | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| mT5 | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| Nystromformer | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| OpenAI GPT | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| OpenAI GPT-2 | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| Pegasus | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| Perceiver | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| PLBart | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| PoolFormer | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| ProphetNet | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| QDQBert | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| RAG | ✅ | ❌ | ✅ | ✅ | ❌ | |
|
| Realm | ✅ | ✅ | ✅ | ❌ | ❌ | |
|
| Reformer | ✅ | ✅ | ✅ | ❌ | ❌ | |
|
| RegNet | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| RemBERT | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| ResNet | ❌ | ❌ | ✅ | ❌ | ✅ | |
|
| RetriBERT | ✅ | ✅ | ✅ | ❌ | ❌ | |
|
| RoBERTa | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| RoFormer | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| SegFormer | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| SEW | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| SEW-D | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| Speech Encoder decoder | ❌ | ❌ | ✅ | ❌ | ✅ | |
|
| Speech2Text | ✅ | ❌ | ✅ | ✅ | ❌ | |
|
| Speech2Text2 | ✅ | ❌ | ❌ | ❌ | ❌ | |
|
| Splinter | ✅ | ✅ | ✅ | ❌ | ❌ | |
|
| SqueezeBERT | ✅ | ✅ | ✅ | ❌ | ❌ | |
|
| Swin | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| T5 | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| TAPAS | ✅ | ❌ | ✅ | ✅ | ❌ | |
|
| TAPEX | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| Transformer-XL | ✅ | ❌ | ✅ | ✅ | ❌ | |
|
| TrOCR | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| UniSpeech | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| UniSpeechSat | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| VAN | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| ViLT | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| Vision Encoder decoder | ❌ | ❌ | ✅ | ✅ | ✅ | |
|
| VisionTextDualEncoder | ❌ | ❌ | ✅ | ❌ | ✅ | |
|
| VisualBert | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| ViT | ❌ | ❌ | ✅ | ✅ | ✅ | |
|
| ViTMAE | ❌ | ❌ | ✅ | ✅ | ❌ | |
|
| Wav2Vec2 | ✅ | ❌ | ✅ | ✅ | ✅ | |
|
| WavLM | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| XGLM | ✅ | ✅ | ✅ | ❌ | ✅ | |
|
| XLM | ✅ | ❌ | ✅ | ✅ | ❌ | |
|
| XLM-RoBERTa | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| XLM-RoBERTa-XL | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| XLMProphetNet | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| XLNet | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| YOSO | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
|
|
<!-- End table--> |
|
|