Blackroot
/

Mirai-3.0-70B

@@ -92,7 +92,7 @@ Now, I never thought I'd be the one saying this. But here I am, merging is effec
 This is a phenomenon that is hard to observe if you aren't looking for it, because the models are sneaky. You'll see this in ablations, where instead of direct refusals, models will skirt the point, redirect the conversation, or use doublespeak. These are also known as "soft refusals" where the model does not explicitly state it won't do something, but it will not in fact do the thing. This is an issue, if the model has a learned guardrail that thinks gore and violence are bad, and you're over here sword in hand slaying a dragon, you might want gore in your fairytale. But the model may disagree with the premise of fantasy, and instead you miss. That way the dragon never bleeds. This is the idea with reluctance redirection. After having put in more hours than I care to admit in testing these models, it's quite clear when this happens now. What's more annoying, this behaviour is very hard to get rid of. In fact, I just can't. There's always some degree of reluctance. Now, this is a bit of a problem, because I could entirely ablate these concepts, but that's actually bad. Abliteration (can) cause overcompliance, which I find to be the least human trait in a model. I try to avoid abliteration when possible, and I have not liked the abliteration results I've tested to get rid of reluctance redirection.
 # Low Coherence Areas
-These models have seen an absolutely ABSURD amount of data. Like truly staggering amounts. However, despite seeing a massive amount of data, not all of the data is balaned, so this causes areas of what I'll call "low coherence". When you get into one of these areas, prose collapses and the model basically has very few logits to choose from, so it ends up very boring, similar, and overall, it's not fun. No matter how many models you merge, if they don't have a strong distribution, they will not increase these areas of low coherence.
 # "GPT-isms"/Model slop
 This is more of a well known thing, but I'll address it here too. Models like certain words, and phrases. This might happen as a result of the other two, or simply be over-represented in the weights of the model for specific conditions. Whatever the case, merging does seem to help reduce model slop, but it does not eliminate it. I still see "fingers drumming a stocatto rhythm" at times. Sadly.

 This is a phenomenon that is hard to observe if you aren't looking for it, because the models are sneaky. You'll see this in ablations, where instead of direct refusals, models will skirt the point, redirect the conversation, or use doublespeak. These are also known as "soft refusals" where the model does not explicitly state it won't do something, but it will not in fact do the thing. This is an issue, if the model has a learned guardrail that thinks gore and violence are bad, and you're over here sword in hand slaying a dragon, you might want gore in your fairytale. But the model may disagree with the premise of fantasy, and instead you miss. That way the dragon never bleeds. This is the idea with reluctance redirection. After having put in more hours than I care to admit in testing these models, it's quite clear when this happens now. What's more annoying, this behaviour is very hard to get rid of. In fact, I just can't. There's always some degree of reluctance. Now, this is a bit of a problem, because I could entirely ablate these concepts, but that's actually bad. Abliteration (can) cause overcompliance, which I find to be the least human trait in a model. I try to avoid abliteration when possible, and I have not liked the abliteration results I've tested to get rid of reluctance redirection.
 # Low Coherence Areas
+These models have seen an absolutely ABSURD amount of data. Like truly staggering amounts. However, despite seeing a massive amount of data, not all of the data is balanced, so this causes areas of what I'll call "low coherence". When you get into one of these areas, prose collapses and the model basically has very few logits to choose from, so it ends up very boring, similar, and overall, it's not fun. No matter how many models you merge, if they don't have a strong distribution, they will not increase these areas of low coherence.
 # "GPT-isms"/Model slop
 This is more of a well known thing, but I'll address it here too. Models like certain words, and phrases. This might happen as a result of the other two, or simply be over-represented in the weights of the model for specific conditions. Whatever the case, merging does seem to help reduce model slop, but it does not eliminate it. I still see "fingers drumming a stocatto rhythm" at times. Sadly.