This repository contains the model (1B version) presented in the paper UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing.

UniLIP proposes a unified, CLIP-based encoder featuring both rich semantics and fine-grained image details. Through a two-stage and self-distillation training for reconstruction, we empower CLIP to achieve excellent reconstruction results without compromising its original understanding abilities. Leveraging this powerful unified representation, UniLIP excels across understanding, generation, and editing tasks.

For more details, please refer to the original paper and the GitHub repository:

Paper: https://www.arxiv.org/abs/2507.23278

GitHub: https://github.com/nnnth/UniLIP

Downloads last month
62
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kanashi6/UniLIP-1B

Datasets used to train kanashi6/UniLIP-1B