Update README.md
Browse files
README.md
CHANGED
@@ -9,7 +9,7 @@ Naively applying the formulas from the paper gave poor results when trying to ad
|
|
9 |
The goal is to estimate which parameters of each model are very important and which parameters are less important. The way I did this was by first computing gradients with respect to this objective:
|
10 |
|
11 |
$$
|
12 |
-
L = E_{x \sim p(x), t \propto \frac{1}{snr_t}} \big[\frac{L_0(x, t)}{C(x, t)}\big]
|
13 |
$$
|
14 |
|
15 |
where L_0 expresses what we are trying to optimize (A corresponds to NoobAI, B corresponds to Animagine):
|
@@ -30,7 +30,7 @@ x_t and snr_t were taken from the DDPM paper (https://arxiv.org/abs/2006.11239):
|
|
30 |
|
31 |
$$
|
32 |
x_t = \sqrt{\bar{\alpha}_t} x + \sqrt{1 - \bar{\alpha}_t} \epsilon \\
|
33 |
-
snr_t = \frac{\bar{\alpha}_t}{1 - \bar{\alpha}_t}
|
34 |
$$
|
35 |
|
36 |
It's important to note that L is not used to train any model here. Instead, we accumulate absolute gradients to estimate the importance of each parameter (explained below):
|
|
|
9 |
The goal is to estimate which parameters of each model are very important and which parameters are less important. The way I did this was by first computing gradients with respect to this objective:
|
10 |
|
11 |
$$
|
12 |
+
L = E_{x \sim p(x), t \propto \frac{1}{snr_t^2}} \big[\frac{L_0(x, t)}{C(x, t)}\big]
|
13 |
$$
|
14 |
|
15 |
where L_0 expresses what we are trying to optimize (A corresponds to NoobAI, B corresponds to Animagine):
|
|
|
30 |
|
31 |
$$
|
32 |
x_t = \sqrt{\bar{\alpha}_t} x + \sqrt{1 - \bar{\alpha}_t} \epsilon \\
|
33 |
+
snr_t = \frac{\sqrt{\bar{\alpha}_t}}{\sqrt{1 - \bar{\alpha}_t}}
|
34 |
$$
|
35 |
|
36 |
It's important to note that L is not used to train any model here. Instead, we accumulate absolute gradients to estimate the importance of each parameter (explained below):
|