CMRCLIP
A CMR-report contrastive model combining Vision Transformers and pretrained text encoders.
Model Overview
CMRCLIP encodes CMR(Cardiac Magnetic Resonance) images and clinical reports into a shared embedding space for retrieval, similarity scoring, and downstream tasks. It uses:
- A pretrained text encoder (
Bio_ClinicalBERT
) - A video encoder built on Vision Transformers (
SpaceTimeTransformer
) - A lightweight projection head to map both modalities into a common vector space
This repository contains only the trained weights and minimal configuration needed to load and run the model.
Files
config.json
โ Model hyperparameters & architecture settingspytorch_model.bin
โ Saved PyTorchstate_dict
of the trained model
Usage Example
Below is a minimal example of how to download and load the model using the Hugging Face Hub:
# Clone the repository
git clone [email protected]:Makiya11/CMRCLIP.git
cd CMRCLIP
# Install dependencies
pip install -r requirements.txt
import json
import torch
from huggingface_hub import hf_hub_download
from model.cmrclip import CMRCLIP
# 1. Download artifacts
def _download_file(filename):
return hf_hub_download(
repo_id="makiyeah/CMRCLIP",
filename=filename
)
config_file = _download_file("config.json")
weights_file = _download_file("pytorch_model.bin")
# 2. Load config & model
with open(config_file, "r") as f:
cfg = json.load(f)
model = CMRCLIP(
video_params=cfg["video_params"],
text_params=cfg["text_params"],
projection_dim=cfg.get("projection_dim", 512),
load_checkpoint=cfg.get("load_checkpoint"),
projection=cfg.get("projection", "minimal"),
)
state_dict = torch.load(weights_file)
model.load_state_dict(state_dict)
model.eval()
Configuration (config.json
)
{
"video_params": {
"model": "SpaceTimeTransformer",
"arch_config": "base_patch16_224",
"num_frames": 64,
"pretrained": true,
"time_init": "zeros"
},
"text_params": {
"model": "emilyalsentzer/Bio_ClinicalBERT",
"pretrained": true,
"input": "text"
},
"projection": "minimal",
"projection_dim": 512,
"load_checkpoint": ""
}
License
This model is released under the MIT license. See LICENSE for details.
Citation
If you use this model in your work, please cite:
@misc{cmrclip2025,
title={CMR-CLIP: Contrastive Language Image Pretraining for a Cardiac Magnetic Resonance Image Embedding with Zero-shot Capabilities},
author={Makiya Nakashima, Jielin Qiu, Peide Huang, Po-Hao Chen, Richard Grimm, Christopher Nguyen, Byung-Hak Kim, Ding Zhao, Deborah Kwon, David Chen},
year={2025},
}
- Downloads last month
- 7
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support