Spaces:

Ahmadzei
/

RAG

Runtime error

App Files Files Community

RAG / chunked /content_aware_chunking /_deepspeed /chunk_37.txt

Ahmadzei

update 1

57bdca5 over 1 year ago

raw

history blame

2.22 kB

	If you don't configure the optimizer in the config, the [Trainer] automatically selects AdamW and either uses the supplied values or the default values for the following parameters from the command line: lr, adam_beta1, adam_beta2, adam_epsilon, weight_decay.
	You can set the parameters to "auto" or manually input your own desired values.
	yaml
	{
	"optimizer": {
	"type": "AdamW",
	"params": {
	"lr": "auto",
	"betas": "auto",
	"eps": "auto",
	"weight_decay": "auto"
	}
	}
	}
	You can also use an unsupported optimizer by adding the following to the top level configuration.
	yaml
	{
	"zero_allow_untested_optimizer": true
	}
	From DeepSpeed==0.8.3 on, if you want to use offload, you'll also need to the following to the top level configuration because offload works best with DeepSpeed's CPU Adam optimizer.
	yaml
	{
	"zero_force_ds_cpu_optimizer": false
	}

	DeepSpeed supports the LRRangeTest, OneCycle, WarmupLR and WarmupDecayLR learning rate schedulers.
	Transformers and DeepSpeed provide two of the same schedulers:

	WarmupLR is the same as --lr_scheduler_type constant_with_warmup in Transformers
	WarmupDecayLR is the same as --lr_scheduler_type linear in Transformers (this is the default scheduler used in Transformers)

	If you don't configure the scheduler in the config, the [Trainer] automatically selects WarmupDecayLR and either uses the supplied values or the default values for the following parameters from the command line: warmup_min_lr, warmup_max_lr, warmup_num_steps, total_num_steps (automatically calculated during run time if max_steps is not provided).
	You can set the parameters to "auto" or manually input your own desired values.
	yaml
	{
	"scheduler": {
	"type": "WarmupDecayLR",
	"params": {
	"total_num_steps": "auto",
	"warmup_min_lr": "auto",
	"warmup_max_lr": "auto",
	"warmup_num_steps": "auto"
	}
	}
	}

	Precision
	Deepspeed supports fp32, fp16, and bf16 mixed precision.

	If your model doesn't work well with mixed precision, for example if it wasn't pretrained in mixed precision, you may encounter overflow or underflow issues which can cause NaN loss.