--- license: mit tags: - multimodal - medical - cardiac - cmr - clip - contrastive-learning - vision-transformer - clinical-bert library_name: pytorch pipeline_tag: feature-extraction datasets: - medical language: - en --- # CMRCLIP > A CMR-report contrastive model combining Vision Transformers and pretrained text encoders. ![CMRCLIP Model Overview](figs/overview.png) --- ## Model Overview **CMRCLIP** encodes CMR(Cardiac Magnetic Resonance) images and clinical reports into a shared embedding space for retrieval, similarity scoring, and downstream tasks. It uses: * A pretrained text encoder (`Bio_ClinicalBERT`) * A video encoder built on Vision Transformers (`SpaceTimeTransformer`) * A lightweight projection head to map both modalities into a common vector space This repository contains only the trained weights and minimal configuration needed to load and run the model. --- ## Files * `config.json` — Model hyperparameters & architecture settings * `pytorch_model.bin` — Saved PyTorch `state_dict` of the trained model --- ## Usage Example Below is a minimal example of how to download and load the model using the Hugging Face Hub: ```bash # Clone the repository git clone git@github.com:Makiya11/CMRCLIP.git cd CMRCLIP # Install dependencies pip install -r requirements.txt ``` ```python import json import torch from huggingface_hub import hf_hub_download from model.cmrclip import CMRCLIP # 1. Download artifacts def _download_file(filename): return hf_hub_download( repo_id="makiyeah/CMRCLIP", filename=filename ) config_file = _download_file("config.json") weights_file = _download_file("pytorch_model.bin") # 2. Load config & model with open(config_file, "r") as f: cfg = json.load(f) model = CMRCLIP( video_params=cfg["video_params"], text_params=cfg["text_params"], projection_dim=cfg.get("projection_dim", 512), load_checkpoint=cfg.get("load_checkpoint"), projection=cfg.get("projection", "minimal"), ) state_dict = torch.load(weights_file) model.load_state_dict(state_dict) model.eval() ``` --- ## Configuration (`config.json`) ```json { "video_params": { "model": "SpaceTimeTransformer", "arch_config": "base_patch16_224", "num_frames": 64, "pretrained": true, "time_init": "zeros" }, "text_params": { "model": "emilyalsentzer/Bio_ClinicalBERT", "pretrained": true, "input": "text" }, "projection": "minimal", "projection_dim": 512, "load_checkpoint": "" } ``` --- ## License This model is released under the **MIT** license. See [LICENSE](LICENSE) for details. --- ## Citation If you use this model in your work, please cite: ```bibtex @misc{cmrclip2025, title={CMR-CLIP: Contrastive Language Image Pretraining for a Cardiac Magnetic Resonance Image Embedding with Zero-shot Capabilities}, author={Makiya Nakashima, Jielin Qiu, Peide Huang, Po-Hao Chen, Richard Grimm, Christopher Nguyen, Byung-Hak Kim, Ding Zhao, Deborah Kwon, David Chen}, year={2025}, } ``` ---