Any-to-Any
English
Chinese
Harmon-1.5B-RecA / README.md
sanaka87's picture
Undo previous PR (#2)
e189553 verified
metadata
base_model:
  - wusize/Harmon-1_5B
datasets:
  - brivangl/midjourney-v6-llava
language:
  - en
  - zh
license: apache-2.0
pipeline_tag: any-to-any

Harmon-1.5B-RecA

A self-supervised training framework that aligns understanding and generation in modest compute, with huge zero-shot gain on generation and editing capability.

This repository hosts the model weights for Harmon-1.5B-RecA, a model from the paper Reconstruction Alignment Improves Unified Multimodal Models. For installation, usage instructions, and further documentation, please visit Harmon's original GitHub repository.

🧠 Method

Paper ArXiv Github Hugging Face Collection HF Demo Project Page

πŸ“Š Benchmarks

Model GenEval ↑ DPGBench ↑ WISE ↑
Harmon-1.5B 0.73 80.93 0.41
Harmon-1.5B-RecA 0.86 87.21 0.50

✍️ Citation

If you find our work inspiring or use our codebase in your research, please consider giving a star ⭐ and a citation~

@misc{xie2025reconstructionalignmentimprovesunified,
      title={Reconstruction Alignment Improves Unified Multimodal Models}, 
      author={Ji Xie and Trevor Darrell and Luke Zettlemoyer and XuDong Wang},
      year={2025},
      eprint={2509.07295},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.07295}, 
}