Idan0405
/

ClipMD

+---
+# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
+# Doc / guide: https://huggingface.co/docs/hub/model-cards
+{}
+---
+# Model Card: ClipMD
+## Model Details
+ClipMD is a medical image-text matching model based on OpenAI's CLIP model with a sliding window text encoder.
+### Model Description
+The model uses a ViT-B/32 Transformer architecture as an image encoder and uses a masked sliding window elf-attention Transformer as a text encoder. These encoders are trained to maximize the similarity of (image, text) pairs via a contrastive loss.
+The model was fine-tuned on the ROCO dataset.
+## Use with Transformers
+```
+from PIL import Image
+from transformers import AutoProcessor,AutoModel
+model = AutoModel.from_pretrained("Idan0405/ClipMD")
+processor = AutoProcessor.from_pretrained("Idan0405/ClipMD")
+image = Image.open("your image path")
+inputs = processor(text=["chest x-ray", "head MRI"], images=image, return_tensors="pt", padding=True)
+outputs = model(**inputs)
+logits_per_image = outputs[0] # this is the image-text similarity score
+probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
+```