Did something go wrong during training?

#6
by CamiloMM - opened

You guys usually release top tier LLM finetunes, so I wonder if there was something that went wrong when training this. Subjectively, it seems worse than base Mistral-Large-Instruct-2407 123B. "Objectively", it seems more censored according to UGI. For some reason it does RP worse? Which makes no sense.

Either way if someone else has a different experience maybe they can comment on this - maybe I'm alone in thinking this is somehow a slight downgrade.

NeverSleep org

We never really trained a model so big, so it was a one shot.
Loss seemed okay, and the unquantized version used for our test too, I'm surprised tho that our model is more censored than base

It's censored? I am just getting shorter replies. Sometimes it's more creative but sometimes it's more dumb.

Base Mistral with a JB does seem to function better

I am just getting shorter replies. Sometimes it's more creative but sometimes it's more dumb.

Yeah, that as well - which usually is the opposite problem, like busting past a 512 token budget sometimes, but the whole reply being *spine gets shivered*<EOT> or something is another problem entirely. Had to re-roll way too much and write stuff to ask it to write at length in the prompt. I assume there was something that went wrong during training, and the low score on UGI is just the intelligence drop.

You can't always win. But even if this turns out to be total crap I would still be sad if it stays a one shot. Ganbatte!

You can also use the correct mistral template and get a bit of a mix between the two.

Sign up or log in to comment