ljleb
/

noobai11-animagine4

Model card Files Files and versions Community

ljleb commited on Feb 13

Commit

682a6dd

·

verified ·

1 Parent(s): 0898d5a

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -43,7 +43,7 @@ To put it simply, we compare the predictions of the models given the same inputs
 In a sense, the parameters with higher expected absolute gradients have a higher slope in the loss landscape. This means that merging these parameters using a naive weighted average approach will cause the loss L to change much more than other parameters with smaller expected absolute gradients.
-In our case with NoobAI and Animagine, as the loss landscape is highly non-linear, naively merging high curvature parameters completely decimates the loss instead of improving it: the merge cannot even denoise anything anymore. We then want to move them as little as possible, keep them in place as much as possible wherever we can.
 To this end, we merge the models according to the weighted average equation used in the Fisher-weighted averaging paper:

 In a sense, the parameters with higher expected absolute gradients have a higher slope in the loss landscape. This means that merging these parameters using a naive weighted average approach will cause the loss L to change much more than other parameters with smaller expected absolute gradients.
+In our case with NoobAI and Animagine, as the loss landscape is highly non-linear, naively merging high slope parameters completely decimates the loss instead of improving it: the merge cannot even denoise anything anymore. We then want to move high slope parameters as little as possible, keep them in place as much as possible wherever we can.
 To this end, we merge the models according to the weighted average equation used in the Fisher-weighted averaging paper: