Commit
·
7ca96c3
1
Parent(s):
089b117
Update README.md
Browse files
README.md
CHANGED
@@ -23,7 +23,7 @@ DistillCLIP is a distilled version of CLIP. Specficially, the teacher model was
|
|
23 |
|
24 |
The knowledge distillation scheme of CLIP is presented below:
|
25 |
|
26 |
-
<img src="https://huggingface.co/Ramos-Ramos/distillclip/resolve/main/distillclip_overview.svg" width="
|
27 |
|
28 |
CLIP is distilled with two losses: $L_{inter}$ and $L_{intra}$. These losses respectively distill the inter-modal (image-text) and intra-modal (image-image, text-text) similarity maps with MSE losses. The final distillation loss is the sum of the two losses, or $L = L_{inter} + L_{intra}$.
|
29 |
|
|
|
23 |
|
24 |
The knowledge distillation scheme of CLIP is presented below:
|
25 |
|
26 |
+
<img src="https://huggingface.co/Ramos-Ramos/distillclip/resolve/main/distillclip_overview.svg" width="75%" height="75%">
|
27 |
|
28 |
CLIP is distilled with two losses: $L_{inter}$ and $L_{intra}$. These losses respectively distill the inter-modal (image-text) and intra-modal (image-image, text-text) similarity maps with MSE losses. The final distillation loss is the sum of the two losses, or $L = L_{inter} + L_{intra}$.
|
29 |
|