Update README.md
Browse files
README.md
CHANGED
@@ -43,7 +43,7 @@ To put it simply, we compare the predictions of the models given the same inputs
|
|
43 |
|
44 |
In a sense, the parameters with higher expected absolute gradients have a higher slope in the loss landscape. This means that merging these parameters using a naive weighted average approach will cause the loss L to change much more than other parameters with smaller expected absolute gradients.
|
45 |
|
46 |
-
In our case with NoobAI and Animagine, as the loss landscape is highly non-linear, naively merging high
|
47 |
|
48 |
To this end, we merge the models according to the weighted average equation used in the Fisher-weighted averaging paper:
|
49 |
|
|
|
43 |
|
44 |
In a sense, the parameters with higher expected absolute gradients have a higher slope in the loss landscape. This means that merging these parameters using a naive weighted average approach will cause the loss L to change much more than other parameters with smaller expected absolute gradients.
|
45 |
|
46 |
+
In our case with NoobAI and Animagine, as the loss landscape is highly non-linear, naively merging high slope parameters completely decimates the loss instead of improving it: the merge cannot even denoise anything anymore. We then want to move high slope parameters as little as possible, keep them in place as much as possible wherever we can.
|
47 |
|
48 |
To this end, we merge the models according to the weighted average equation used in the Fisher-weighted averaging paper:
|
49 |
|