Gokuldaskumar h94 commited on
Commit
9e82a70
·
verified ·
0 Parent(s):

Duplicate from h94/IP-Adapter

Browse files

Co-authored-by: xiaohu <[email protected]>

.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - text-to-image
4
+ - stable-diffusion
5
+ license: apache-2.0
6
+ language:
7
+ - en
8
+ library_name: diffusers
9
+ ---
10
+
11
+ # IP-Adapter Model Card
12
+
13
+
14
+ <div align="center">
15
+
16
+ [**Project Page**](https://ip-adapter.github.io) **|** [**Paper (ArXiv)**](https://arxiv.org/abs/2308.06721) **|** [**Code**](https://github.com/tencent-ailab/IP-Adapter)
17
+ </div>
18
+
19
+ ---
20
+
21
+
22
+ ## Introduction
23
+
24
+ we present IP-Adapter, an effective and lightweight adapter to achieve image prompt capability for the pre-trained text-to-image diffusion models. An IP-Adapter with only 22M parameters can achieve comparable or even better performance to a fine-tuned image prompt model. IP-Adapter can be generalized not only to other custom models fine-tuned from the same base model, but also to controllable generation using existing controllable tools. Moreover, the image prompt can also work well with the text prompt to accomplish multimodal image generation.
25
+
26
+ ![arch](./fig1.png)
27
+
28
+ ## Models
29
+
30
+ ### Image Encoder
31
+ - [models/image_encoder](https://huggingface.co/h94/IP-Adapter/tree/main/models/image_encoder): [OpenCLIP-ViT-H-14](https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K) with 632.08M parameter
32
+ - [sdxl_models/image_encoder](https://huggingface.co/h94/IP-Adapter/tree/main/sdxl_models/image_encoder): [OpenCLIP-ViT-bigG-14](https://huggingface.co/laion/CLIP-ViT-bigG-14-laion2B-39B-b160k) with 1844.9M parameter
33
+
34
+ More information can be found [here](https://laion.ai/blog/giant-openclip/)
35
+
36
+ ### IP-Adapter for SD 1.5
37
+ - [ip-adapter_sd15.bin](https://huggingface.co/h94/IP-Adapter/blob/main/models/ip-adapter_sd15.bin): use global image embedding from OpenCLIP-ViT-H-14 as condition
38
+ - [ip-adapter_sd15_light.bin](https://huggingface.co/h94/IP-Adapter/blob/main/models/ip-adapter_sd15_light.bin): same as ip-adapter_sd15, but more compatible with text prompt
39
+ - [ip-adapter-plus_sd15.bin](https://huggingface.co/h94/IP-Adapter/blob/main/models/ip-adapter-plus_sd15.bin): use patch image embeddings from OpenCLIP-ViT-H-14 as condition, closer to the reference image than ip-adapter_sd15
40
+ - [ip-adapter-plus-face_sd15.bin](https://huggingface.co/h94/IP-Adapter/blob/main/models/ip-adapter-plus-face_sd15.bin): same as ip-adapter-plus_sd15, but use cropped face image as condition
41
+
42
+ ### IP-Adapter for SDXL 1.0
43
+ - [ip-adapter_sdxl.bin](https://huggingface.co/h94/IP-Adapter/blob/main/sdxl_models/ip-adapter_sdxl.bin): use global image embedding from OpenCLIP-ViT-bigG-14 as condition
44
+ - [ip-adapter_sdxl_vit-h.bin](https://huggingface.co/h94/IP-Adapter/blob/main/sdxl_models/ip-adapter_sdxl_vit-h.bin): same as ip-adapter_sdxl, but use OpenCLIP-ViT-H-14
45
+ - [ip-adapter-plus_sdxl_vit-h.bin](https://huggingface.co/h94/IP-Adapter/blob/main/sdxl_models/ip-adapter-plus_sdxl_vit-h.bin): use patch image embeddings from OpenCLIP-ViT-H-14 as condition, closer to the reference image than ip-adapter_xl and ip-adapter_sdxl_vit-h
46
+ - [ip-adapter-plus-face_sdxl_vit-h.bin](https://huggingface.co/h94/IP-Adapter/blob/main/sdxl_models/ip-adapter-plus-face_sdxl_vit-h.bin): same as ip-adapter-plus_sdxl_vit-h, but use cropped face image as condition
fig1.png ADDED
models/image_encoder/config.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "./image_encoder",
3
+ "architectures": [
4
+ "CLIPVisionModelWithProjection"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "dropout": 0.0,
8
+ "hidden_act": "gelu",
9
+ "hidden_size": 1280,
10
+ "image_size": 224,
11
+ "initializer_factor": 1.0,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 5120,
14
+ "layer_norm_eps": 1e-05,
15
+ "model_type": "clip_vision_model",
16
+ "num_attention_heads": 16,
17
+ "num_channels": 3,
18
+ "num_hidden_layers": 32,
19
+ "patch_size": 14,
20
+ "projection_dim": 1024,
21
+ "torch_dtype": "float16",
22
+ "transformers_version": "4.28.0.dev0"
23
+ }
models/image_encoder/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6ca9667da1ca9e0b0f75e46bb030f7e011f44f86cbfb8d5a36590fcd7507b030
3
+ size 2528373448
models/image_encoder/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3d3ec1e66737f77a4f3bc2df3c52eacefc69ce7825e2784183b1d4e9877d9193
3
+ size 2528481905
models/ip-adapter-full-face_sd15.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:47ec4644114f3bfe25b2fc830af6b0dd8dcad9a0371a238b9cc919465c60d1dc
3
+ size 43592551
models/ip-adapter-full-face_sd15.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f4a17fb643bf876235a45a0e87a49da2855be6584b28ca04c62a97ab5ff1c6f3
3
+ size 43592352
models/ip-adapter-plus-face_sd15.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aa09c22b49ef63474dcde12f26a35b8b8e9b755b716a553aa29e8dbe8d21e0c9
3
+ size 98183381
models/ip-adapter-plus-face_sd15.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1c9edc21af6f737dc1d6e0e734190e976cfacf802d6b024b77aa3be922f7569b
3
+ size 98183288
models/ip-adapter-plus_sd15.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1cb77fc0613369b66be1531cc452b823a4af7d87ee56956000a69fc39e3817ba
3
+ size 158033179
models/ip-adapter-plus_sd15.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a1c250be40455cc61a43da1201ec3f1edaea71214865fb47f57927e06cbe4996
3
+ size 98183288
models/ip-adapter_sd15.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:68e1df30d760f280e578c302f1e73b37ea08654eff16a31153588047affe0058
3
+ size 44642825
models/ip-adapter_sd15.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:289b45f16d043d0bf542e45831f971dcdaabe18b656f11e86d9dfba7e9ee3369
3
+ size 44642768
models/ip-adapter_sd15_light.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f71bfbdd937f2edad0c894ec72d12db02b3be0316f62988e5fc669ca4da6b7e1
3
+ size 44642819
models/ip-adapter_sd15_light.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0747d08db670535bfa286452a77d93cebad5c677b46d038543f9f2de8690bb26
3
+ size 44642768
models/ip-adapter_sd15_vit-G.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1398e9ae37cb65553a8525871830a283914dafd9ec3039716344a826399ec474
3
+ size 46215689
models/ip-adapter_sd15_vit-G.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a26f736af07bb341a83dfea23713531d0575760e8ed947c68cb31a4c62d9c90b
3
+ size 46215640
sdxl_models/image_encoder/config.json ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "CLIPVisionModelWithProjection"
4
+ ],
5
+ "_name_or_path": "",
6
+ "add_cross_attention": false,
7
+ "architectures": null,
8
+ "attention_dropout": 0.0,
9
+ "bad_words_ids": null,
10
+ "begin_suppress_tokens": null,
11
+ "bos_token_id": null,
12
+ "chunk_size_feed_forward": 0,
13
+ "cross_attention_hidden_size": null,
14
+ "decoder_start_token_id": null,
15
+ "diversity_penalty": 0.0,
16
+ "do_sample": false,
17
+ "dropout": 0.0,
18
+ "early_stopping": false,
19
+ "encoder_no_repeat_ngram_size": 0,
20
+ "eos_token_id": null,
21
+ "exponential_decay_length_penalty": null,
22
+ "finetuning_task": null,
23
+ "forced_bos_token_id": null,
24
+ "forced_eos_token_id": null,
25
+ "hidden_act": "gelu",
26
+ "hidden_size": 1664,
27
+ "id2label": {
28
+ "0": "LABEL_0",
29
+ "1": "LABEL_1"
30
+ },
31
+ "image_size": 224,
32
+ "initializer_factor": 1.0,
33
+ "initializer_range": 0.02,
34
+ "intermediate_size": 8192,
35
+ "is_decoder": false,
36
+ "is_encoder_decoder": false,
37
+ "label2id": {
38
+ "LABEL_0": 0,
39
+ "LABEL_1": 1
40
+ },
41
+ "layer_norm_eps": 1e-05,
42
+ "length_penalty": 1.0,
43
+ "max_length": 20,
44
+ "min_length": 0,
45
+ "model_type": "clip_vision_model",
46
+ "no_repeat_ngram_size": 0,
47
+ "num_attention_heads": 16,
48
+ "num_beam_groups": 1,
49
+ "num_beams": 1,
50
+ "num_channels": 3,
51
+ "num_hidden_layers": 48,
52
+ "num_return_sequences": 1,
53
+ "output_attentions": false,
54
+ "output_hidden_states": false,
55
+ "output_scores": false,
56
+ "pad_token_id": null,
57
+ "patch_size": 14,
58
+ "prefix": null,
59
+ "problem_type": null,
60
+ "pruned_heads": {},
61
+ "remove_invalid_values": false,
62
+ "repetition_penalty": 1.0,
63
+ "return_dict": true,
64
+ "return_dict_in_generate": false,
65
+ "sep_token_id": null,
66
+ "suppress_tokens": null,
67
+ "task_specific_params": null,
68
+ "temperature": 1.0,
69
+ "tf_legacy_loss": false,
70
+ "tie_encoder_decoder": false,
71
+ "tie_word_embeddings": true,
72
+ "tokenizer_class": null,
73
+ "top_k": 50,
74
+ "top_p": 1.0,
75
+ "torch_dtype": null,
76
+ "torchscript": false,
77
+ "transformers_version": "4.24.0",
78
+ "typical_p": 1.0,
79
+ "use_bfloat16": false,
80
+ "projection_dim": 1280
81
+ }
sdxl_models/image_encoder/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:657723e09f46a7c3957df651601029f66b1748afb12b419816330f16ed45d64d
3
+ size 3689912664
sdxl_models/image_encoder/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2999562fbc02f9dc0d9c0acb7cf0970ec3a9b2a578d7d05afe82191d606d2d80
3
+ size 3690112753
sdxl_models/ip-adapter-plus-face_sdxl_vit-h.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:50e886d82940b3c5873d80c2b06d8a4b0d0fccec70bc44fd53f16ac3cfd7fc36
3
+ size 1013454761
sdxl_models/ip-adapter-plus-face_sdxl_vit-h.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:677ad8860204f7d0bfba12d29e6c31ded9beefdf3e4bbd102518357d31a292c1
3
+ size 847517512
sdxl_models/ip-adapter-plus_sdxl_vit-h.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ec70edb7cc8e769c9388d94eeaea3e4526352c9fae793a608782d1d8951fde90
3
+ size 1013454427
sdxl_models/ip-adapter-plus_sdxl_vit-h.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3f5062b8400c94b7159665b21ba5c62acdcd7682262743d7f2aefedef00e6581
3
+ size 847517512
sdxl_models/ip-adapter_sdxl.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7525f2731e9e86d1368e0b68467615d55dda459691965bdd7d37fa3d7fd84c12
3
+ size 702585097
sdxl_models/ip-adapter_sdxl.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ba1002529e783604c5f326d49f0122025392d1d20ac8d573b3eeb3e6dea4ebb6
3
+ size 702585376
sdxl_models/ip-adapter_sdxl_vit-h.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6b382e2501d0ab3fe2e09312e561a59cd3f21262aff25373700e0cd62c635929
3
+ size 698390793
sdxl_models/ip-adapter_sdxl_vit-h.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ebf05d918348aec7abb02a5e9ecef77e0aaea6914a5c4ea13f50d45eb1681831
3
+ size 698391064