ljleb commited on
Commit
682a6dd
·
verified ·
1 Parent(s): 0898d5a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -43,7 +43,7 @@ To put it simply, we compare the predictions of the models given the same inputs
43
 
44
  In a sense, the parameters with higher expected absolute gradients have a higher slope in the loss landscape. This means that merging these parameters using a naive weighted average approach will cause the loss L to change much more than other parameters with smaller expected absolute gradients.
45
 
46
- In our case with NoobAI and Animagine, as the loss landscape is highly non-linear, naively merging high curvature parameters completely decimates the loss instead of improving it: the merge cannot even denoise anything anymore. We then want to move them as little as possible, keep them in place as much as possible wherever we can.
47
 
48
  To this end, we merge the models according to the weighted average equation used in the Fisher-weighted averaging paper:
49
 
 
43
 
44
  In a sense, the parameters with higher expected absolute gradients have a higher slope in the loss landscape. This means that merging these parameters using a naive weighted average approach will cause the loss L to change much more than other parameters with smaller expected absolute gradients.
45
 
46
+ In our case with NoobAI and Animagine, as the loss landscape is highly non-linear, naively merging high slope parameters completely decimates the loss instead of improving it: the merge cannot even denoise anything anymore. We then want to move high slope parameters as little as possible, keep them in place as much as possible wherever we can.
47
 
48
  To this end, we merge the models according to the weighted average equation used in the Fisher-weighted averaging paper:
49