fixed typos and added info pointed out in evaluation
Browse files
README.md
CHANGED
|
@@ -7,7 +7,7 @@ tags:
|
|
| 7 |
|
| 8 |
## Model Details
|
| 9 |
|
| 10 |
-
This model is a fine-tuned [CLIP by OpenAI](https://huggingface.co/openai/clip-vit-base-patch32). It is designed with aim to improve zero-shot image classification, text-to-image and image-to-image retrieval specifically on remote sensing images.
|
| 11 |
|
| 12 |
### Model Date
|
| 13 |
|
|
@@ -19,17 +19,17 @@ The base model uses a ViT-B/32 Transformer architecture as an image encoder and
|
|
| 19 |
|
| 20 |
### Model Version
|
| 21 |
|
| 22 |
-
We release several checkpoints for `clip-rsicd` model. Refer to [our github repo](https://github.com/arampacha/CLIP-rsicd) for zero-shot classification for each of those.
|
| 23 |
|
| 24 |
### Training
|
| 25 |
|
| 26 |
To reproduce the fine-tuning procedure one can use released [script](https://github.com/arampacha/CLIP-rsicd/blob/master/run_clip_flax_tv.py).
|
| 27 |
The model was trained using batch size 1024, adafactor optimizer with linear warmup and decay with peak learning rate 1e-4 on 1 TPU-v3-8.
|
| 28 |
-
Full log of the training run
|
| 29 |
|
| 30 |
### Demo
|
| 31 |
|
| 32 |
-
|
| 33 |
|
| 34 |
|
| 35 |
### Documents
|
|
@@ -67,7 +67,12 @@ for l, p in zip(labels, probs[0]):
|
|
| 67 |
|
| 68 |
### Intended Use
|
| 69 |
|
| 70 |
-
The model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, arbitrary image classification.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 71 |
|
| 72 |
#### Primary intended uses
|
| 73 |
|
|
@@ -79,7 +84,8 @@ We primarily imagine the model will be used by researchers to better understand
|
|
| 79 |
|
| 80 |
## Data
|
| 81 |
|
| 82 |
-
The model was trained on publicly available remote sensing image
|
|
|
|
| 83 |
|
| 84 |
|
| 85 |
## Performance and Limitations
|
|
|
|
| 7 |
|
| 8 |
## Model Details
|
| 9 |
|
| 10 |
+
This model is a fine-tuned [CLIP by OpenAI](https://huggingface.co/openai/clip-vit-base-patch32). It is designed with an aim to improve zero-shot image classification, text-to-image and image-to-image retrieval specifically on remote sensing images.
|
| 11 |
|
| 12 |
### Model Date
|
| 13 |
|
|
|
|
| 19 |
|
| 20 |
### Model Version
|
| 21 |
|
| 22 |
+
We release several checkpoints for `clip-rsicd` model. Refer to [our github repo](https://github.com/arampacha/CLIP-rsicd#evaluation-results) for performance metrics on zero-shot classification for each of those.
|
| 23 |
|
| 24 |
### Training
|
| 25 |
|
| 26 |
To reproduce the fine-tuning procedure one can use released [script](https://github.com/arampacha/CLIP-rsicd/blob/master/run_clip_flax_tv.py).
|
| 27 |
The model was trained using batch size 1024, adafactor optimizer with linear warmup and decay with peak learning rate 1e-4 on 1 TPU-v3-8.
|
| 28 |
+
Full log of the training run can be found on [WandB](https://wandb.ai/wandb/hf-flax-clip-rsicd/runs/2dj1exsw).
|
| 29 |
|
| 30 |
### Demo
|
| 31 |
|
| 32 |
+
Check out the model text-to-image and image-to-image capabilities using [this demo](https://huggingface.co/spaces/sujitpal/clip-rsicd-demo).
|
| 33 |
|
| 34 |
|
| 35 |
### Documents
|
|
|
|
| 67 |
|
| 68 |
### Intended Use
|
| 69 |
|
| 70 |
+
The model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, arbitrary image classification.
|
| 71 |
+
|
| 72 |
+
In addition, we can imagine applications in defense and law enforcement, climate change and global warming, and even some consumer applications. A partial list of applications can be found [here](https://github.com/arampacha/CLIP-rsicd#applications). In general we think such models can be useful as digital assistants for humans engaged in searching through large collections of images.
|
| 73 |
+
|
| 74 |
+
We also hope it can be used for interdisciplinary studies of the potential impact of such models - the CLIP paper includes a discussion of potential downstream impacts to provide an example for this sort of analysis.
|
| 75 |
+
|
| 76 |
|
| 77 |
#### Primary intended uses
|
| 78 |
|
|
|
|
| 84 |
|
| 85 |
## Data
|
| 86 |
|
| 87 |
+
The model was trained on publicly available remote sensing image captions datasets. Namely [RSICD](https://github.com/201528014227051/RSICD_optimal), [UCM](https://mega.nz/folder/wCpSzSoS#RXzIlrv--TDt3ENZdKN8JA) and [Sydney](https://mega.nz/folder/pG4yTYYA#4c4buNFLibryZnlujsrwEQ). More information on the datasets used can be found on [our project page](https://github.com/arampacha/CLIP-rsicd#dataset).
|
| 88 |
+
|
| 89 |
|
| 90 |
|
| 91 |
## Performance and Limitations
|