XiaomiMiMo/MiMo-VL-7B-RL · Correct base

Remove base_modeld36e7d80

ngxson

May 30

•

edited May 30

Correct this so the model tree can be shown

I assume that SFT is the base model because of this explanation:

The development of MiMo-VL-7B involves two sequential training processes: (1) A four-stage pre-training phase, which includes projector warmup, vision-language alignment, general multi-modal pre-training, and long-context Supervised Fine-Tuning (SFT). This phase yields the MiMo-VL-7B-SFT model. (2) A subsequent post-training phase, where we introduce Mixed On-policy Reinforcement Learning (MORL), a novel framework that seamlessly integrates diverse reward signals spanning perception accuracy, visual grounding precision, logical reasoning capabilities, and human/AI preferences. This phase yields the MiMo-VL-7B-RL model.

ngxson changed pull request title from Remove base_model to Correct base_model May 30

Update README.md53cc57be