Ahmadzei's picture
update 1
57bdca5
raw
history blame
537 Bytes
For example:
[GPT2ForSequenceClassification] is a sequence classification head - a linear layer - on top of the base [GPT2Model].
[ViTForImageClassification] is an image classification head - a linear layer on top of the final hidden state of the CLS token - on top of the base [ViTModel].
[Wav2Vec2ForCTC] is a language modeling head with CTC on top of the base [Wav2Vec2Model].
I
image patch
Vision-based Transformers models split an image into smaller patches which are linearly embedded, and then passed as a sequence to the model.