library_name: transformers | |
tags: | |
- image-geolocation | |
- geolocation | |
- geography | |
- geoguessr | |
- multi-modal | |
license: cc-by-nc-4.0 | |
language: | |
- en | |
base_model: openai/clip-vit-large-patch14-336 | |
pipeline_tag: zero-shot-image-classification | |
# Model Card for Thesis-CLIP-geoloc-continent | |
CLIP-ViT model fine-tuned for image geolocation. Optimized for queries at country-level. | |
## Model Details | |
### Model Description | |
- **Developed by:** [jrheiner](https://huggingface.co/jrheiner) | |
<!-- - **Funded by [optional]:** [More Information Needed] --> | |
<!-- - **Shared by [optional]:** [More Information Needed] --> | |
- **Model type:** CLIP-ViT | |
- **Language(s) (NLP):** English | |
- **License:** Creative Commons Attribution Non Commercial 4.0 | |
- **Finetuned from model: [openai/clip-vit-large-patch14-336](https://huggingface.co/openai/clip-vit-large-patch14-336)** | |
### Model Sources | |
<!-- Provide the basic links for the model. --> | |
- **Repository:** https://github.com/jrheiner/thesis-appendix | |
<!-- - **Paper:** [More Information Needed] --> | |
- **Demo:** [Image Geolocation Demo Space](https://huggingface.co/spaces/jrheiner/thesis-demo) | |
## How to Get Started with the Model | |
```python | |
from PIL import Image | |
import requests | |
from transformers import CLIPProcessor, CLIPModel | |
model = CLIPModel.from_pretrained("jrheiner/thesis-clip-geoloc-continent") | |
processor = CLIPProcessor.from_pretrained("jrheiner/thesis-clip-geoloc-continent") | |
url = "https://huggingface.co/spaces/jrheiner/thesis-demo/resolve/main/kerger-test-images/Oceania_Australia_-32.947127313081_151.47903359833_kerger.jpg" | |
image = Image.open(requests.get(url, stream=True).raw) | |
choices = ["Botswana", "Eswatini", "Ghana", "Kenya", "Lesotho", "Nigeria", "Senegal", "South Africa", "Rwanda", "Uganda", "Tanzania", "Madagascar", "Djibouti", "Mali", "Libya", "Morocco", "Somalia", "Tunisia", "Egypt", "Réunion", "Bangladesh", "Bhutan", "Cambodia", "China", "India", "Indonesia", "Israel", "Japan", "Jordan", "Kyrgyzstan", "Laos", "Malaysia", "Mongolia", "Nepal", "Palestine", "Philippines", "Singapore", "South Korea", "Sri Lanka", "Taiwan", "Thailand", "United Arab Emirates", "Vietnam", "Afghanistan", "Azerbaijan", "Cyprus", "Iran", "Syria", "Tajikistan", "Turkey", "Russia", "Pakistan", "Hong Kong", "Albania", "Andorra", "Austria", "Belgium", "Bulgaria", "Croatia", "Czechia", "Denmark", "Estonia", "Finland", "France", "Germany", "Greece", "Hungary", "Iceland", "Ireland", "Italy", "Latvia", "Lithuania", "Luxembourg", "Montenegro", "Netherlands", "North Macedonia", "Norway", "Poland", "Portugal", "Romania", "Russia", "Serbia", "Slovakia", "Slovenia", "Spain", "Sweden", "Switzerland", "Ukraine", "United Kingdom", "Bosnia and Herzegovina", "Cyprus", "Turkey", "Greenland", "Faroe Islands", "Canada", "Dominican Republic", "Guatemala", "Mexico", "United States", "Bahamas", "Cuba", "Panama", "Puerto Rico", "Bermuda", "Greenland", "Australia", "New Zealand", "Fiji", "Papua New Guinea", "Solomon Islands", "Vanuatu", "Argentina", "Bolivia", "Brazil", "Chile", "Colombia", "Ecuador", "Paraguay", "Peru", "Uruguay"] | |
inputs = processor(text=choices, images=image, return_tensors="pt", padding=True) | |
outputs = model(**inputs) | |
logits_per_image = outputs.logits_per_image # this is the image-text similarity score | |
probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities | |
``` | |
## Training Details | |
The model was fine-tuned on 177 270 images (29 545 per continent) sourced from Mapillary. | |