The portion of the unlabeled data that the model predicts with the most confidence gets added to the labeled dataset and used to retrain the model.
sequence-to-sequence (seq2seq)
Models that generate a new sequence from an input, like translation models, or summarization models (such as
Bart or T5).
Sharded DDP
Another name for the foundational ZeRO concept as used by various other implementations of ZeRO.
stride
In convolution or pooling, the stride refers to the distance the kernel is moved over a matrix.