VL-Cogito

Website    Model    Dataset    Paper

The homepage of our multimodal reasoning model—VL-Cogito! Inspired by the Latin word “Cogito” (“I think”), VL-Cogito is built for complex and diverse multimodal reasoning tasks, with a strong focus on autonomous thinking and adaptability.

What makes VL-Cogito stand out?

Progressive Curriculum Reinforcement Learning (PCuRL):Through a multi-stage, “from easy to hard” reinforcement learning approach, VL-Cogito’s reasoning abilities are significantly enhanced across a wide range of multimodal scenarios!

Two key innovations:

  • Online difficulty weighting: Dynamically adjusts training difficulty, allowing the model to progress step by step from easier to more challenging examples.
  • Dynamic length reward: Encourages the model to adapt the length of its reasoning process based on the complexity of each individual problem, balancing both accuracy and efficiency.

Outstanding Performance:

VL-Cogito demonstrates stable, state-of-the-art or superior results on mainstream multimodal reasoning benchmarks, covering mathematics, science, logic, and commonsense understanding!

The framework of our model.

Downloads last month
122
Safetensors
Model size
8.29B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for csyrf/VL-Cogito

Finetuned
(584)
this model
Quantizations
2 models

Collection including csyrf/VL-Cogito