|
<!--Copyright 2020 The HuggingFace Team. All rights reserved. |
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with |
|
the License. You may obtain a copy of the License at |
|
|
|
http: |
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on |
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the |
|
specific language governing permissions and limitations under the License. |
|
--> |
|
|
|
# 🤗 Transformers |
|
|
|
Machine Learning de última generación para PyTorch, TensorFlow y JAX. |
|
|
|
🤗 Transformers proporciona APIs para descargar y entrenar fácilmente modelos preentrenados de última generación. El uso de modelos preentrenados puede reducir tus costos de cómputo, tu huella de carbono y ahorrarte tiempo al entrenar un modelo desde cero. Los modelos se pueden utilizar en diferentes modalidades, tales como: |
|
|
|
|
|
|
|
|
|
|
|
|
|
Nuestra biblioteca admite una integración perfecta entre tres de las bibliotecas de deep learning más populares: [PyTorch](https: |
|
Cada arquitectura de 🤗 Transformers se define en un módulo de Python independiente para que se puedan personalizar fácilmente para investigación y experimentos. |
|
|
|
## Si estás buscando soporte personalizado del equipo de Hugging Face |
|
|
|
<a target="_blank" href="https://huggingface.co/support"> |
|
<img alt="HuggingFace Expert Acceleration Program" src="https://huggingface.co/front/thumbnails/support.png" style="width: 100%; max-width: 600px; border: 1px solid #eee; border-radius: 4px; box-shadow: 0 1px 2px 0 rgba(0, 0, 0, 0.05);"> |
|
</a> |
|
|
|
## Contenidos |
|
|
|
La documentación está organizada en cuatro partes: |
|
|
|
- **EMPEZAR** contiene un recorrido rápido e instrucciones de instalación para comenzar a usar 🤗 Transformers. |
|
- **TUTORIALES** es un excelente lugar para comenzar. Esta sección te ayudará a obtener las habilidades básicas que necesitas para comenzar a usar 🤗 Transformers. |
|
- **GUÍAS PRÁCTICAS** te mostrará cómo lograr un objetivo específico, cómo hacer fine-tuning a un modelo preentrenado para el modelado de lenguaje o cómo crear un cabezal para un modelo personalizado. |
|
- **GUÍAS CONCEPTUALES** proporciona más discusión y explicación de los conceptos e ideas subyacentes detrás de los modelos, las tareas y la filosofía de diseño de 🤗 Transformers. |
|
|
|
La biblioteca actualmente contiene implementaciones de JAX, PyTorch y TensorFlow, pesos de modelos preentrenados, scripts de uso y utilidades de conversión para los siguientes modelos. |
|
|
|
### Modelos compatibles |
|
|
|
<!--This list is updated automatically from the README with _make fix-copies_. Do not update manually! --> |
|
|
|
1. **[ALBERT](model_doc/albert)** (de Google Research y el Instituto Tecnológico de Toyota en Chicago) publicado con el paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https: |
|
1. **[ALIGN](model_doc/align)** (de Google Research) publicado con el paper [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https: |
|
1. **[BART](model_doc/bart)** (de Facebook) publicado con el paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https: |
|
1. **[BARThez](model_doc/barthez)** (de École polytechnique) publicado con el paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https: |
|
1. **[BARTpho](model_doc/bartpho)** (de VinAI Research) publicado con el paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https: |
|
1. **[BEiT](model_doc/beit)** (de Microsoft) publicado con el paper [BEiT: BERT Pre-Training of Image Transformers](https: |
|
1. **[BERT](model_doc/bert)** (de Google) publicado con el paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https: |
|
1. **[BERTweet](model_doc/bertweet)** (de VinAI Research) publicado con el paper [BERTweet: A pre-trained language model for English Tweets](https: |
|
1. **[BERT For Sequence Generation](model_doc/bert-generation)** (de Google) publicado con el paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https: |
|
1. **[BigBird-RoBERTa](model_doc/big_bird)** (de Google Research) publicado con el paper [Big Bird: Transformers for Longer Sequences](https: |
|
1. **[BigBird-Pegasus](model_doc/bigbird_pegasus)** (de Google Research) publicado con el paper [Big Bird: Transformers for Longer Sequences](https: |
|
1. **[Blenderbot](model_doc/blenderbot)** (de Facebook) publicado con el paper [Recipes for building an open-domain chatbot](https: |
|
1. **[BlenderbotSmall](model_doc/blenderbot-small)** (de Facebook) publicado con el paper [Recipes for building an open-domain chatbot](https: |
|
1. **[BORT](model_doc/bort)** (de Alexa) publicado con el paper [Optimal Subarchitecture Extraction For BERT](https: |
|
1. **[ByT5](model_doc/byt5)** (de Google Research) publicado con el paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https: |
|
1. **[CamemBERT](model_doc/camembert)** (de Inria/Facebook/Sorbonne) publicado con el paper [CamemBERT: a Tasty French Language Model](https: |
|
1. **[CANINE](model_doc/canine)** (de Google Research) publicado con el paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https: |
|
1. **[ConvNeXT](model_doc/convnext)** (de Facebook AI) publicado con el paper [A ConvNet for the 2020s](https: |
|
1. **[ConvNeXTV2](model_doc/convnextv2)** (de Facebook AI) publicado con el paper [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https: |
|
1. **[CLIP](model_doc/clip)** (de OpenAI) publicado con el paper [Learning Transferable Visual Models From Natural Language Supervision](https: |
|
1. **[ConvBERT](model_doc/convbert)** (de YituTech) publicado con el paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https: |
|
1. **[CPM](model_doc/cpm)** (de Universidad de Tsinghua) publicado con el paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https: |
|
1. **[CTRL](model_doc/ctrl)** (de Salesforce) publicado con el paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https: |
|
1. **[Data2Vec](model_doc/data2vec)** (de Facebook) publicado con el paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https: |
|
1. **[DeBERTa](model_doc/deberta)** (de Microsoft) publicado con el paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https: |
|
1. **[DeBERTa-v2](model_doc/deberta-v2)** (de Microsoft) publicado con el paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https: |
|
1. **[Decision Transformer](model_doc/decision_transformer)** (de Berkeley/Facebook/Google) publicado con el paper [Decision Transformer: Reinforcement Learning via Sequence Modeling](https: |
|
1. **[DiT](model_doc/dit)** (de Microsoft Research) publicado con el paper [DiT: Self-supervised Pre-training for Document Image Transformer](https: |
|
1. **[DeiT](model_doc/deit)** (de Facebook) publicado con el paper [Training data-efficient image transformers & distillation through attention](https: |
|
1. **[DETR](model_doc/detr)** (de Facebook) publicado con el paper [End-to-End Object Detection with Transformers](https: |
|
1. **[DialoGPT](model_doc/dialogpt)** (de Microsoft Research) publicado con el paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https: |
|
1. **[DistilBERT](model_doc/distilbert)** (de HuggingFace), publicado junto con el paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https: |
|
1. **[DPR](model_doc/dpr)** (de Facebook) publicado con el paper [Dense Passage Retrieval for Open-Domain Question Answering](https: |
|
1. **[DPT](master/model_doc/dpt)** (de Intel Labs) publicado con el paper [Vision Transformers for Dense Prediction](https: |
|
1. **[EfficientNet](model_doc/efficientnet)** (from Google Research) released with the paper [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https: |
|
1. **[EncoderDecoder](model_doc/encoder-decoder)** (de Google Research) publicado con el paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https: |
|
1. **[ELECTRA](model_doc/electra)** (de Google Research/Universidad de Stanford) publicado con el paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https: |
|
1. **[FlauBERT](model_doc/flaubert)** (de CNRS) publicado con el paper [FlauBERT: Unsupervised Language Model Pre-training for French](https: |
|
1. **[FNet](model_doc/fnet)** (de Google Research) publicado con el paper [FNet: Mixing Tokens with Fourier Transforms](https: |
|
1. **[Funnel Transformer](model_doc/funnel)** (de CMU/Google Brain) publicado con el paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https: |
|
1. **[GLPN](model_doc/glpn)** (de KAIST) publicado con el paper [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https: |
|
1. **[GPT](model_doc/openai-gpt)** (de OpenAI) publicado con el paper [Improving Language Understanding by Generative Pre-Training](https: |
|
1. **[GPT-2](model_doc/gpt2)** (de OpenAI) publicado con el paper [Language Models are Unsupervised Multitask Learners](https: |
|
1. **[GPT-J](model_doc/gptj)** (de EleutherAI) publicado con el repositorio [kingoflolz/mesh-transformer-jax](https: |
|
1. **[GPT Neo](model_doc/gpt_neo)** (de EleutherAI) publicado en el paper [EleutherAI/gpt-neo](https: |
|
1. **[GPTSAN-japanese](model_doc/gptsan-japanese)** released with [GPTSAN](https: |
|
1. **[Hubert](model_doc/hubert)** (de Facebook) publicado con el paper [HuBERT: Self-Supervised Speech Representation Learning por Masked Prediction of Hidden Units](https: |
|
1. **[I-BERT](model_doc/ibert)** (de Berkeley) publicado con el paper [I-BERT: Integer-only BERT Quantization](https: |
|
1. **[ImageGPT](model_doc/imagegpt)** (de OpenAI) publicado con el paper [Generative Pretraining from Pixels](https: |
|
1. **[LayoutLM](model_doc/layoutlm)** (de Microsoft Research Asia) publicado con el paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https: |
|
1. **[LayoutLMv2](model_doc/layoutlmv2)** (de Microsoft Research Asia) publicado con el paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https: |
|
1. **[LayoutXLM](model_doc/layoutxlm)** (de Microsoft Research Asia) publicado con el paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https: |
|
1. **[LED](model_doc/led)** (de AllenAI) publicado con el paper [Longformer: The Long-Document Transformer](https: |
|
1. **[Longformer](model_doc/longformer)** (de AllenAI) publicado con el paper [Longformer: The Long-Document Transformer](https: |
|
1. **[LUKE](model_doc/luke)** (de Studio Ousia) publicado con el paper [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https: |
|
1. **[mLUKE](model_doc/mluke)** (de Studio Ousia) publicado con el paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https: |
|
1. **[LXMERT](model_doc/lxmert)** (de UNC Chapel Hill) publicado con el paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https: |
|
1. **[M2M100](model_doc/m2m_100)** (de Facebook) publicado con el paper [Beyond English-Centric Multilingual Machine Translation](https: |
|
1. **[MarianMT](model_doc/marian)** Modelos de traducción automática entrenados usando [OPUS](http: |
|
1. **[Mask2Former](model_doc/mask2former)** (de FAIR y UIUC) publicado con el paper [Masked-attention Mask Transformer for Universal Image Segmentation](https: |
|
1. **[MaskFormer](model_doc/maskformer)** (de Meta y UIUC) publicado con el paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https: |
|
1. **[MBart](model_doc/mbart)** (de Facebook) publicado con el paper [Multilingual Denoising Pre-training for Neural Machine Translation](https: |
|
1. **[MBart-50](model_doc/mbart)** (de Facebook) publicado con el paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https: |
|
1. **[Megatron-BERT](model_doc/megatron-bert)** (de NVIDIA) publicado con el paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https: |
|
1. **[Megatron-GPT2](model_doc/megatron_gpt2)** (de NVIDIA) publicado con el paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https: |
|
1. **[MPNet](model_doc/mpnet)** (de Microsoft Research) publicado con el paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https: |
|
1. **[MT5](model_doc/mt5)** (de Google AI) publicado con el paper [mT5: A massively multilingual pre-trained text-to-text transformer](https: |
|
1. **[Nyströmformer](model_doc/nystromformer)** (de la Universidad de Wisconsin - Madison) publicado con el paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https: |
|
1. **[OneFormer](model_doc/oneformer)** (de la SHI Labs) publicado con el paper [OneFormer: One Transformer to Rule Universal Image Segmentation](https: |
|
1. **[Pegasus](model_doc/pegasus)** (de Google) publicado con el paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https: |
|
1. **[Perceiver IO](model_doc/perceiver)** (de Deepmind) publicado con el paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https: |
|
1. **[PhoBERT](model_doc/phobert)** (de VinAI Research) publicado con el paper [PhoBERT: Pre-trained language models for Vietnamese](https: |
|
1. **[PLBart](model_doc/plbart)** (de UCLA NLP) publicado con el paper [Unified Pre-training for Program Understanding and Generation](https: |
|
1. **[PoolFormer](model_doc/poolformer)** (de Sea AI Labs) publicado con el paper [MetaFormer is Actually What You Need for Vision](https: |
|
1. **[ProphetNet](model_doc/prophetnet)** (de Microsoft Research) publicado con el paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https: |
|
1. **[QDQBert](model_doc/qdqbert)** (de NVIDIA) publicado con el paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https: |
|
1. **[REALM](model_doc/realm.html)** (de Google Research) publicado con el paper [REALM: Retrieval-Augmented Language Model Pre-Training](https: |
|
1. **[Reformer](model_doc/reformer)** (de Google Research) publicado con el paper [Reformer: The Efficient Transformer](https: |
|
1. **[RemBERT](model_doc/rembert)** (de Google Research) publicado con el paper [Rethinking embedding coupling in pre-trained language models](https: |
|
1. **[RegNet](model_doc/regnet)** (de META Platforms) publicado con el paper [Designing Network Design Space](https: |
|
1. **[ResNet](model_doc/resnet)** (de Microsoft Research) publicado con el paper [Deep Residual Learning for Image Recognition](https: |
|
1. **[RoBERTa](model_doc/roberta)** (de Facebook), publicado junto con el paper [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https: |
|
1. **[RoFormer](model_doc/roformer)** (de ZhuiyiTechnology), publicado junto con el paper [RoFormer: Enhanced Transformer with Rotary Position Embedding](https: |
|
1. **[SegFormer](model_doc/segformer)** (de NVIDIA) publicado con el paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https: |
|
1. **[SEW](model_doc/sew)** (de ASAPP) publicado con el paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https: |
|
1. **[SEW-D](model_doc/sew_d)** (de ASAPP) publicado con el paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https: |
|
1. **[SpeechToTextTransformer](model_doc/speech_to_text)** (de Facebook), publicado junto con el paper [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https: |
|
1. **[SpeechToTextTransformer2](model_doc/speech_to_text_2)** (de Facebook), publicado junto con el paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https: |
|
1. **[Splinter](model_doc/splinter)** (de Universidad de Tel Aviv), publicado junto con el paper [Few-Shot Question Answering by Pretraining Span Selection](https: |
|
1. **[SqueezeBert](model_doc/squeezebert)** (de Berkeley) publicado con el paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https: |
|
1. **[Swin Transformer](model_doc/swin)** (de Microsoft) publicado con el paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https: |
|
1. **[T5](model_doc/t5)** (de Google AI) publicado con el paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https: |
|
1. **[T5v1.1](model_doc/t5v1.1)** (de Google AI) publicado en el repositorio [google-research/text-to-text-transfer-transformer](https: |
|
1. **[TAPAS](model_doc/tapas)** (de Google AI) publicado con el paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https: |
|
1. **[TAPEX](model_doc/tapex)** (de Microsoft Research) publicado con el paper [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https: |
|
1. **[Transformer-XL](model_doc/transfo-xl)** (de Google/CMU) publicado con el paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https: |
|
1. **[TrOCR](model_doc/trocr)** (de Microsoft), publicado junto con el paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https: |
|
1. **[UniSpeech](model_doc/unispeech)** (de Microsoft Research) publicado con el paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https: |
|
1. **[UniSpeechSat](model_doc/unispeech-sat)** (de Microsoft Research) publicado con el paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https: |
|
1. **[VAN](model_doc/van)** (de la Universidad de Tsinghua y la Universidad de Nankai) publicado con el paper [Visual Attention Network](https: |
|
1. **[ViLT](model_doc/vilt)** (de NAVER AI Lab/Kakao Enterprise/Kakao Brain) publicado con el paper [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https: |
|
1. **[Vision Transformer (ViT)](model_doc/vit)** (de Google AI) publicado con el paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https: |
|
1. **[ViTMAE](model_doc/vit_mae)** (de Meta AI) publicado con el paper [Masked Autoencoders Are Scalable Vision Learners](https: |
|
1. **[VisualBERT](model_doc/visual_bert)** (de UCLA NLP) publicado con el paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https: |
|
1. **[WavLM](model_doc/wavlm)** (de Microsoft Research) publicado con el paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https: |
|
1. **[Wav2Vec2](model_doc/wav2vec2)** (de Facebook AI) publicado con el paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https: |
|
1. **[Wav2Vec2Phoneme](model_doc/wav2vec2_phoneme)** (de Facebook AI) publicado con el paper [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition](https: |
|
1. **[XGLM](model_doc/xglm)** (de Facebook AI) publicado con el paper [Few-shot Learning with Multilingual Language Models](https: |
|
1. **[XLM](model_doc/xlm)** (de Facebook) publicado junto con el paper [Cross-lingual Language Model Pretraining](https: |
|
1. **[XLM-ProphetNet](model_doc/xlm-prophetnet)** (de Microsoft Research) publicado con el paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https: |
|
1. **[XLM-RoBERTa](model_doc/xlm-roberta)** (de Facebook AI), publicado junto con el paper [Unsupervised Cross-lingual Representation Learning at Scale](https: |
|
1. **[XLM-RoBERTa-XL](model_doc/xlm-roberta-xl)** (de Facebook AI), publicado junto con el paper [Larger-Scale Transformers for Multilingual Masked Language Modeling](https: |
|
1. **[XLNet](model_doc/xlnet)** (de Google/CMU) publicado con el paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https: |
|
1. **[XLSR-Wav2Vec2](model_doc/xlsr_wav2vec2)** (de Facebook AI) publicado con el paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https: |
|
1. **[XLS-R](model_doc/xls_r)** (de Facebook AI) publicado con el paper [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https: |
|
1. **[YOSO](model_doc/yoso)** (de la Universidad de Wisconsin-Madison) publicado con el paper [You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling](https: |
|
|
|
|
|
### Frameworks compatibles |
|
|
|
La siguiente tabla representa el soporte actual en la biblioteca para cada uno de esos modelos, ya sea que tengan un tokenizador de Python (llamado "slow"). Un tokenizador "fast" respaldado por la biblioteca 🤗 Tokenizers, ya sea que tengan soporte en Jax (a través de |
|
Flax), PyTorch y/o TensorFlow. |
|
|
|
<!--This table is updated automatically from the auto modules with _make fix-copies_. Do not update manually!--> |
|
|
|
| Modelo | Tokenizer slow | Tokenizer fast | PyTorch support | TensorFlow support | Flax Support | |
|
|:---------------------------:|:--------------:|:--------------:|:---------------:|:------------------:|:------------:| |
|
| ALBERT | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| BART | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| BEiT | ❌ | ❌ | ✅ | ❌ | ✅ | |
|
| BERT | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| Bert Generation | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| BigBird | ✅ | ✅ | ✅ | ❌ | ✅ | |
|
| BigBirdPegasus | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| Blenderbot | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| BlenderbotSmall | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| CamemBERT | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| Canine | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| CLIP | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| ConvBERT | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| ConvNext | ❌ | ❌ | ✅ | ✅ | ❌ | |
|
| CTRL | ✅ | ❌ | ✅ | ✅ | ❌ | |
|
| Data2VecAudio | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| Data2VecText | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| DeBERTa | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| DeBERTa-v2 | ✅ | ❌ | ✅ | ✅ | ❌ | |
|
| Decision Transformer | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| DeiT | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| DETR | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| DistilBERT | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| DPR | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| DPT | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| ELECTRA | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| Encoder decoder | ❌ | ❌ | ✅ | ✅ | ✅ | |
|
| FairSeq Machine-Translation | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| FlauBERT | ✅ | ❌ | ✅ | ✅ | ❌ | |
|
| FNet | ✅ | ✅ | ✅ | ❌ | ❌ | |
|
| Funnel Transformer | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| GLPN | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| GPT Neo | ❌ | ❌ | ✅ | ❌ | ✅ | |
|
| GPT-J | ❌ | ❌ | ✅ | ✅ | ✅ | |
|
| Hubert | ❌ | ❌ | ✅ | ✅ | ❌ | |
|
| I-BERT | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| ImageGPT | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| LayoutLM | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| LayoutLMv2 | ✅ | ✅ | ✅ | ❌ | ❌ | |
|
| LED | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| Longformer | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| LUKE | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| LXMERT | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| M2M100 | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| Marian | ✅ | ❌ | ✅ | ✅ | ✅ | |
|
| MaskFormer | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| mBART | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| MegatronBert | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| MobileBERT | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| MPNet | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| mT5 | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| Nystromformer | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| OpenAI GPT | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| OpenAI GPT-2 | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| Pegasus | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| Perceiver | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| PLBart | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| PoolFormer | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| ProphetNet | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| QDQBert | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| RAG | ✅ | ❌ | ✅ | ✅ | ❌ | |
|
| Realm | ✅ | ✅ | ✅ | ❌ | ❌ | |
|
| Reformer | ✅ | ✅ | ✅ | ❌ | ❌ | |
|
| RegNet | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| RemBERT | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| ResNet | ❌ | ❌ | ✅ | ❌ | ✅ | |
|
| RetriBERT | ✅ | ✅ | ✅ | ❌ | ❌ | |
|
| RoBERTa | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| RoFormer | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| SegFormer | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| SEW | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| SEW-D | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| Speech Encoder decoder | ❌ | ❌ | ✅ | ❌ | ✅ | |
|
| Speech2Text | ✅ | ❌ | ✅ | ✅ | ❌ | |
|
| Speech2Text2 | ✅ | ❌ | ❌ | ❌ | ❌ | |
|
| Splinter | ✅ | ✅ | ✅ | ❌ | ❌ | |
|
| SqueezeBERT | ✅ | ✅ | ✅ | ❌ | ❌ | |
|
| Swin | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| T5 | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| TAPAS | ✅ | ❌ | ✅ | ✅ | ❌ | |
|
| TAPEX | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| Transformer-XL | ✅ | ❌ | ✅ | ✅ | ❌ | |
|
| TrOCR | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| UniSpeech | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| UniSpeechSat | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| VAN | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| ViLT | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| Vision Encoder decoder | ❌ | ❌ | ✅ | ✅ | ✅ | |
|
| VisionTextDualEncoder | ❌ | ❌ | ✅ | ❌ | ✅ | |
|
| VisualBert | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| ViT | ❌ | ❌ | ✅ | ✅ | ✅ | |
|
| ViTMAE | ❌ | ❌ | ✅ | ✅ | ❌ | |
|
| Wav2Vec2 | ✅ | ❌ | ✅ | ✅ | ✅ | |
|
| WavLM | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| XGLM | ✅ | ✅ | ✅ | ❌ | ✅ | |
|
| XLM | ✅ | ❌ | ✅ | ✅ | ❌ | |
|
| XLM-RoBERTa | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| XLM-RoBERTa-XL | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| XLMProphetNet | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| XLNet | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| YOSO | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
|
|
<!-- End table--> |
|
|