This is what is sometimes called horizontal parallelism, as the splitting happens on horizontal level. Learn more about Tensor Parallelism here. token A part of a sentence, usually a word, but can also be a subword (non-common words are often split in subwords) or a punctuation symbol. token Type IDs Some models' purpose is to do classification on pairs of sentences or question answering. These require two different sequences to be joined in a single "input_ids" entry, which usually is performed with the help of special tokens, such as the classifier ([CLS]) and separator ([SEP]) tokens.