Add image-to-image pipeline tag and library name
Browse filesThis PR adds the `image-to-image` pipeline tag to the model card, ensuring that people can find this model at https://huggingface.co/models?pipeline_tag=image-to-image. It also adds the `library_name` tag for better searchability. It also changes the license mentioned in the "Model Description" to MIT to avoid inconsistencies.
README.md
CHANGED
@@ -1,5 +1,7 @@
|
|
1 |
---
|
2 |
license: mit
|
|
|
|
|
3 |
tags:
|
4 |
- lumos
|
5 |
- image to image
|
@@ -7,6 +9,7 @@ tags:
|
|
7 |
- novel view synthesis
|
8 |
- image to video
|
9 |
---
|
|
|
10 |
<p align="center">
|
11 |
<img src="asset/logo.gif" height=20>
|
12 |
</p>
|
@@ -41,7 +44,7 @@ Source code is available at https://github.com/xiaomabufei/lumos.
|
|
41 |
|
42 |
- **Developed by:** Lumos
|
43 |
- **Model type:** Diffusion-Transformer-based generative model
|
44 |
-
- **License:**
|
45 |
- **Model Description:** **Lumos-I2I** is a model designed for generating images based on image prompts. It utilizes a [Transformer Latent Diffusion architecture](https://arxiv.org/abs/2310.00426) and incorporates a fixed, pretrained vision encoder ([DINO](
|
46 |
https://dl.fbaipublicfiles.com/dino/dino_vitbase16_pretrain/dino_vitbase16_pretrain.pth)). **Lumos-T2I** is a model that can be used to generate images based on text prompts.
|
47 |
It is a [Transformer Latent Diffusion Model](https://arxiv.org/abs/2310.00426) that uses one fixed, pretrained text encoders ([T5](
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
library_name: diffusers
|
4 |
+
pipeline_tag: image-to-image
|
5 |
tags:
|
6 |
- lumos
|
7 |
- image to image
|
|
|
9 |
- novel view synthesis
|
10 |
- image to video
|
11 |
---
|
12 |
+
|
13 |
<p align="center">
|
14 |
<img src="asset/logo.gif" height=20>
|
15 |
</p>
|
|
|
44 |
|
45 |
- **Developed by:** Lumos
|
46 |
- **Model type:** Diffusion-Transformer-based generative model
|
47 |
+
- **License:** MIT
|
48 |
- **Model Description:** **Lumos-I2I** is a model designed for generating images based on image prompts. It utilizes a [Transformer Latent Diffusion architecture](https://arxiv.org/abs/2310.00426) and incorporates a fixed, pretrained vision encoder ([DINO](
|
49 |
https://dl.fbaipublicfiles.com/dino/dino_vitbase16_pretrain/dino_vitbase16_pretrain.pth)). **Lumos-T2I** is a model that can be used to generate images based on text prompts.
|
50 |
It is a [Transformer Latent Diffusion Model](https://arxiv.org/abs/2310.00426) that uses one fixed, pretrained text encoders ([T5](
|