Difference between this and the other (100 steps) model?

#1
by lemon07r - opened

Im curious what the difference is between this model and the other one, only difference I see is in the name, the "100 steps".

Owner

The "AALF/gemma-2-27b-it-SimPO-37K-100steps" model is a checkpoint of "AALF/gemma-2-27b-it-SimPO-37K" after training 100 global steps.

The "AALF/gemma-2-27b-it-SimPO-37K-100steps" model is a checkpoint of "AALF/gemma-2-27b-it-SimPO-37K" after training 100 global steps.

Is this model, before or after those 100 steps

The "AALF/gemma-2-27b-it-SimPO-37K-100steps" model is a checkpoint of "AALF/gemma-2-27b-it-SimPO-37K" after training 100 global steps.

Is this model, before or after those 100 steps

After, refer to trainer_state.json

Which one we should use?

Which one we should use?

AALF/gemma-2-27b-it-SimPO-37K-100steps is better.

Sign up or log in to comment