|
Unsupervised learning techniques leverage statistical information of the data distribution to find patterns useful for the task at hand. |
|
Z |
|
Zero Redundancy Optimizer (ZeRO) |
|
Parallelism technique which performs sharding of the tensors somewhat similar to TensorParallel, |
|
except the whole tensor gets reconstructed in time for a forward or backward computation, therefore the model doesn't need |
|
to be modified. This method also supports various offloading techniques to compensate for limited GPU memory. |