metadata
base_model:
- wusize/Harmon-1_5B
datasets:
- brivangl/midjourney-v6-llava
language:
- en
- zh
license: apache-2.0
pipeline_tag: any-to-any
Harmon-1.5B-RecA
A self-supervised training framework that aligns understanding and generation in modest compute, with huge zero-shot gain on generation and editing capability.
This repository hosts the model weights for Harmon-1.5B-RecA, a model from the paper Reconstruction Alignment Improves Unified Multimodal Models. For installation, usage instructions, and further documentation, please visit Harmon's original GitHub repository.
π§ Method
π Benchmarks
Model | GenEval β | DPGBench β | WISE β |
---|---|---|---|
Harmon-1.5B | 0.73 | 80.93 | 0.41 |
Harmon-1.5B-RecA | 0.86 | 87.21 | 0.50 |
βοΈ Citation
If you find our work inspiring or use our codebase in your research, please consider giving a star β and a citation~
@misc{xie2025reconstructionalignmentimprovesunified,
title={Reconstruction Alignment Improves Unified Multimodal Models},
author={Ji Xie and Trevor Darrell and Luke Zettlemoyer and XuDong Wang},
year={2025},
eprint={2509.07295},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.07295},
}