README.md · vector-institute/open-pmc-clip at main

metadata

license: mit
datasets:
  - vector-institute/open-pmc
metrics:
  - accuracy
  - f1
  - recall

Arxiv: Arxiv | Code: Open-PMC Github | Dataset: Hugging Face

Model Overview

This model is a checkpoint trained on the Open-PMC dataset. It utilizes a Vision Transformer (ViT-base16) as the backbone for visual feature extraction and PubMedBERT for processing text data. The model is trained using Contrastive Learning with the vanilla Info-NCE loss to learn meaningful representations across different modalities.

Model Architecture

Vision Backbone: ViT-B/16 (Pretrained on ImageNet)
Text Backbone: PubMedBERT (Pretrained on PubMedCentral Abstracts)
Training Objective: Contrastive Learning with Info-NCE Loss

Training Framework

The model was trained using the mmlearn framework, which is designed for multimodal learning. You can find more information and access the framework here.

How to Use

Please visit out GitHub for information on how to run benchmarking using this checkpoint