Spaces:

Ahmadzei
/

RAG

Runtime error

update 1

57bdca5 over 1 year ago

1.01 kB

	To learn more about the other available FSDP options, take a look at the fsdp_config parameters.
	Sharding strategy
	FSDP offers a number of sharding strategies to select from:

	FULL_SHARD - shards model parameters, gradients and optimizer states across workers; select 1 for this option
	SHARD_GRAD_OP- shard gradients and optimizer states across workers; select 2 for this option
	NO_SHARD - don't shard anything (this is equivalent to DDP); select 3 for this option
	HYBRID_SHARD - shard model parameters, gradients and optimizer states within each worker where each worker also has a full copy; select 4 for this option
	HYBRID_SHARD_ZERO2 - shard gradients and optimizer states within each worker where each worker also has a full copy; select 5 for this option

	This is enabled by the fsdp_sharding_strategy flag.
	CPU offload
	You could also offload parameters and gradients when they are not in use to the CPU to save even more GPU memory and help you fit large models where even FSDP may not be sufficient.