--- license: mit datasets: - vector-institute/open-pmc metrics: - accuracy - f1 - recall ---

Arxiv: Arxiv | Code: Open-PMC Github | Dataset: Hugging Face

## Model Overview This model is a checkpoint trained on the **Open-PMC** dataset. It utilizes a **Vision Transformer (ViT-base16)** as the backbone for visual feature extraction and **PubMedBERT** for processing text data. The model is trained using **Contrastive Learning** with the **vanilla Info-NCE loss** to learn meaningful representations across different modalities. ## Model Architecture - **Vision Backbone**: ViT-B/16 (Pretrained on ImageNet) - **Text Backbone**: PubMedBERT (Pretrained on PubMedCentral Abstracts) - **Training Objective**: Contrastive Learning with **Info-NCE Loss** ## Training Framework The model was trained using the **mmlearn** framework, which is designed for multimodal learning. You can find more information and access the framework [here](https://github.com/vectorInstitute/mmlearn). ## How to Use Please visit out GitHub for information on how to run benchmarking using this checkpoint