The portion of the unlabeled data that the model predicts with the most confidence gets added to the labeled dataset and used to retrain the model. sequence-to-sequence (seq2seq) Models that generate a new sequence from an input, like translation models, or summarization models (such as Bart or T5). Sharded DDP Another name for the foundational ZeRO concept as used by various other implementations of ZeRO. stride In convolution or pooling, the stride refers to the distance the kernel is moved over a matrix.