You left some bugs in the design
#2
by
win10
- opened
You left some bugs at the beginning of the design, such as the classic problem in multi-gpu computing: RuntimeError: Trying to backward through the graph a second time, which is no problem in a single backward without gradient accumulation. But once gradient accumulation is introduced, disaster occurs.
Ah I see
Ah I see
Can the current architecture be used with phi4?
I tried my best to adjust the hyperparameters and hard-coded values, but I always get 'loss': 0.0, 'grad_norm': nan.
win10
changed discussion status to
closed