This repository contains the model (1B version) presented in the paper UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing.
UniLIP proposes a unified, CLIP-based encoder featuring both rich semantics and fine-grained image details. Through a two-stage and self-distillation training for reconstruction, we empower CLIP to achieve excellent reconstruction results without compromising its original understanding abilities. Leveraging this powerful unified representation, UniLIP excels across understanding, generation, and editing tasks.
For more details, please refer to the original paper and the GitHub repository:
Paper: https://www.arxiv.org/abs/2507.23278
GitHub: https://github.com/nnnth/UniLIP
- Downloads last month
- 62
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for kanashi6/UniLIP-1B
Base model
OpenGVLab/InternVL3-1B-Pretrained
Finetuned
OpenGVLab/InternVL3-1B-Instruct
Finetuned
OpenGVLab/InternVL3-1B