Ahmadzei's picture
update 1
57bdca5
raw
history blame
385 Bytes
The model updates its weights based on how incorrect its predictions were, and the process is repeated to optimize model performance.
T
Tensor Parallelism (TP)
Parallelism technique for training on multiple GPUs in which each tensor is split up into multiple chunks, so instead of
having the whole tensor reside on a single GPU, each shard of the tensor resides on its designated GPU.