ljleb
/

noobai11-animagine4

Model card Files Files and versions Community

ljleb commited on Feb 13

Commit

73bc6e2

·

verified ·

1 Parent(s): eb48358

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -75,7 +75,7 @@ For Animagine, following the guidance from the readme of the huggingface release
 The prompts are then prepended for half of the dataset by a random artist following the distribution of artists in the Danbooru image board (only including users that were trained on, those that have a larger catalogue get more weight). In particular, at each gradient accumulation step, we either prepend an artist tag to the prompt of NoobAI (for even iterations) or that of Animagine (for odd iterations) but never both at the same time.
-The goal of this prompting strategy is to make sure the artists distribution of the models is at least partially covered (there are too many artists to sample them all). It increases L_0 which helps with the precision of calculated gradients, and activates slightly different paths in the two models which helps with covering an overall wider region of the loss landscape.
 As we compare the outputs of the models directly, and not to an absolute expected epsilon noise map, this asymmetric prompting strategy does not affect too much the quality of accumulated gradients.

 The prompts are then prepended for half of the dataset by a random artist following the distribution of artists in the Danbooru image board (only including users that were trained on, those that have a larger catalogue get more weight). In particular, at each gradient accumulation step, we either prepend an artist tag to the prompt of NoobAI (for even iterations) or that of Animagine (for odd iterations) but never both at the same time.
+The goal of this prompting strategy is to make sure the artists distribution of the models is at least partially covered (there are too many artists to sample them all). It increases L_0 which helps with reducing noise in the calculated gradients, and activates slightly different paths in the two models which helps with covering an overall wider region of the loss landscape.
 As we compare the outputs of the models directly, and not to an absolute expected epsilon noise map, this asymmetric prompting strategy does not affect too much the quality of accumulated gradients.