Gen-Verse
/

MMaDA-8B-Base

feature-extraction

Model card Files Files and versions Community

MMaDA-8B-Base / README.md

Lingaaaaaaa's picture

Update README.md

065b306 verified 15 days ago

|

history blame contribute delete

1.38 kB

	---
	license: mit
	library_name: transformers
	pipeline_tag: any-to-any
	---

	# MMaDA-8B-Base

	We introduce MMaDA, a novel class of multimodal diffusion foundation models designed to achieve superior performance across diverse domains such as textual reasoning, multimodal understanding, and text-to-image generation. MMaDA is distinguished by three key innovations:

	1. MMaDA adopts a unified diffusion architecture with a shared probabilistic formulation and a modality-agnostic design, eliminating the need for modality-specific components.
	2. MMaDA introduces a mixed long chain-of-thought (CoT) fine-tuning strategy that curates a unified CoT format across modalities.
	3. MMaDA adopts a unified policy-gradient-based RL algorithm, which we call UniGRPO, tailored for diffusion foundation models. Utilizing diversified reward modeling, UniGRPO unifies post-training across both reasoning and generation tasks, ensuring consistent performance improvements.

	[Paper](https://arxiv.org/abs/2505.15809) \| [Code](https://github.com/Gen-Verse/MMaDA) \| [Demo](https://huggingface.co/spaces/Gen-Verse/MMaDA)

	# Citation

	```
	@article{yang2025mmada,
	title={MMaDA: Multimodal Large Diffusion Language Models},
	author={Yang, Ling and Tian, Ye and Li, Bowen and Zhang, Xinchen and Shen, Ke and Tong, Yunhai and Wang, Mengdi},
	journal={arXiv preprint arXiv:2505.15809},
	year={2025}
	}
	```