Document banlist biases
Browse files
README.md
CHANGED
|
@@ -16,6 +16,15 @@ Then paste this whole damned thing into ST. Use the global banlist so you don't
|
|
| 16 |
|
| 17 |
After that? **Enjoy half as much LLM slop!**
|
| 18 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
## Credits
|
| 20 |
- Me. The top half of the file is shit I banned by hand over the course of months because I was tired of seeing it.
|
| 21 |
- [AntiSlop-Sampler](https://github.com/sam-paech/antislop-sampler)'s pile of common strings and phrases, and it occupies the remainder of the file. It's largely just pasted in, so there's a bunch of stuff that's statistically likely to come up only if you read a lot of bad ao3 vampire fic. I've never seen, like, two thirds of it.
|
|
|
|
| 16 |
|
| 17 |
After that? **Enjoy half as much LLM slop!**
|
| 18 |
|
| 19 |
+
## Biases
|
| 20 |
+
There are not only inherent biases in this list of tokens, but there are selected biases: the vast majority of the banned strings are cliches that come up when the LLM is role-playing (or ERP'ing) or writing as one or more cis females. There are some strings in there which aren't chosen around that criterion, but they're strings likely to have come up in ao3 and/or RP forum datasets.
|
| 21 |
+
|
| 22 |
+
That is to say: if you use these strings for banning tokens, but you'll probably be disappointed if the LLM is writing or using male characters. There are heaps upon heaps of human-generated and synthetic slop that won't have come up in my life at all and I'd never even know it was there. At that point you're relying on the banned phrase list at the bottom.
|
| 23 |
+
|
| 24 |
+
This bias is *intentional.* I will spend zero time seeking out that kind of slop in order to kill it. We can all wish someone else, someone affected by it, will come along and add to it.
|
| 25 |
+
|
| 26 |
+
Ideally at the bottom, so I can truncate it when I use this data.
|
| 27 |
+
|
| 28 |
## Credits
|
| 29 |
- Me. The top half of the file is shit I banned by hand over the course of months because I was tired of seeing it.
|
| 30 |
- [AntiSlop-Sampler](https://github.com/sam-paech/antislop-sampler)'s pile of common strings and phrases, and it occupies the remainder of the file. It's largely just pasted in, so there's a bunch of stuff that's statistically likely to come up only if you read a lot of bad ao3 vampire fic. I've never seen, like, two thirds of it.
|