Ahmadzei's picture
update 1
57bdca5
raw
history blame
503 Bytes
Unsupervised learning techniques leverage statistical information of the data distribution to find patterns useful for the task at hand.
Z
Zero Redundancy Optimizer (ZeRO)
Parallelism technique which performs sharding of the tensors somewhat similar to TensorParallel,
except the whole tensor gets reconstructed in time for a forward or backward computation, therefore the model doesn't need
to be modified. This method also supports various offloading techniques to compensate for limited GPU memory.