You left some bugs in the design

#2
by win10 - opened

You left some bugs at the beginning of the design, such as the classic problem in multi-gpu computing: RuntimeError: Trying to backward through the graph a second time, which is no problem in a single backward without gradient accumulation. But once gradient accumulation is introduced, disaster occurs.

Ah I see

Ah I see

Can the current architecture be used with phi4?
I tried my best to adjust the hyperparameters and hard-coded values, but I always get 'loss': 0.0, 'grad_norm': nan.

win10 changed discussion status to closed

Sign up or log in to comment