[BUG] {'use_reentrant': True} results in "Gradients will be None"
#74
by
RonanMcGovern
- opened
Seems there's no way to use reentrancy for gradient checkpointing without errors. This results in high memory for fine-tuning.