aurora-m (Aurora-M)

Organization Card

We are a group of volunteer researchers dedicated to promoting equal access to multimodal and multilingual AI. Our goal is to build a permissive and open stack for developing multimodal LLMs. This initiative is a collaborative effort led by OntocordAI. We began as an effort named MDEL Multi-Domain Expert Learning.

The -m in Aurora-M refers to our focus on multimodal, multilingual, multidomain mixture-of-experts (MoE) models, each of which we aim to explore and develop through ongoing research.

Building on our previous success— Aurora-M: Open Source Continual Pre-training for Multilingual Language and Code — we are training a new family of models called Aurora-M2 aligned with laws, regulations, and policies for controllable AI. The series will include models with parameter sizes of 3B, 8B, and 21B, aligned with the comprehensive policy framework of the EU AI Act, specifically Annex III of the Act.

As part of our commitment to openness, we plan to open-source the entire training pipeline and experimental process—including data synthesis and the evolving methodologies we employ in model training. Stay with us!