Model Card for CXformer
CXformer is a vision transformer tailored for chest X-ray analysis, adapted from DINOv2 with clinically motivated training modifications. This repository provides code for pretraining CXformer using our optimized pipeline, as well as scripts for finetuning on downstream tasks like classification, segmentation, and report generation. For more details on pre-training, please checkout our paper accepted at MIDL 2025.
- Finetuned from model:
facebook/dinov2-with-registers-small
- License: CC BY-NC-4.0: This work is licensed under CC BY-NC. Additionally, use is limited to research purposes only.
Key highlights:
- Improved training with register tokens, teacher centering, and optimized attention heads.
- Self-supervised pretraining on 600K+ CXRs from 5 global datasets.
- Strong generalization across 3 core tasks: classification, segmentation, and report generation.
- CXformer(S) matches the performance of RAD-DINO with 7ร less training compute (in FLOPS).
- Models are available on Hugging Face ๐ค: CXformer(B), CXformer(S)
- Training and inference codebase: CXformer-python-library
- Paper: Empirical Analysis of Scaling Vision Foundation Models for Chest X-rays (MIDL 2025)
Pretrain Dataset
CXformer was pretrained on publicly available datasets, focusing on frontal views of chest X-rays (PA/AP):
- CheXpert
- MIMIC-CXR
- PadChest
- NIH-CXR8
- BRAX
The official training splits were used for CheXpert, MIMIC and NIH, and all available samples in BRAX and PadChest were used in pretraining.
Downstream Tasks
Task | Dataset(s) |
---|---|
Image Classification | CheXpert, NIH-CXR8, RSNA, VinDr |
Segmentation | CheXmask |
Report Generation | MIMIC-CXR, IU-Xray |
Usage
from transformers import AutoModel, AutoImageProcessor
from PIL import Image
model_name = "m42-health/CXformer-base"
image_processor = AutoImageProcessor.from_pretrained(model_name,trust_remote_code=True)
model = AutoModel.from_pretrained(model_name)
model.eval()
image = Image.open('sample_cxr.png')
image = image_processor(image, return_tensors='pt')
print(image['pixel_values'].shape) # [1,3,518,518]
print("Doing forwardpass...")
output = model(**image).last_hidden_state # [1, 1374, 768]
Results Summary
Classification (AUROC)
Model | CheXpert | RSNA | NIH-CXR8 | Avg. |
---|---|---|---|---|
CXformer(S) | 83.34 | 91.13 | 83.68 | 86.05 |
CXformer(B) | 86.80 | 91.71 | 85.28 | 87.93 |
Segmentation (Dice Score)
Model | Lungs | Heart | Avg. |
---|---|---|---|
CXformer(S) | 91.69 | 89.35 | 90.52 |
CXformer(B) | 91.94 | 89.94 | 90.94 |
Report Generation (MIMIC-CXR)
Model | ROUGE-L | BLEU-4 | RGER | F1-14 | Avg. |
---|---|---|---|---|---|
CXformer(S) | 25.25 | 9.11 | 23.06 | 33.85 | 27.51 |
CXformer(B) | 24.93 | 9.03 | 22.94 | 33.45 | 27.16 |
Disclaimer
CXformer is intended exclusively for research purposes. It is not validated for clinical decision-making, nor is it approved for use in healthcare environments. The model should not be used for any diagnostic or therapeutic applications in a clinical setting.
License
This project is licensed under CC BY-NC-4.0
Citation
@inproceedings{al2025empirical,
title={Empirical Analysis of Scaling Vision Foundation Models for Chest X-rays},
author={Al Mahrooqi, Ahmed and Munjal, Prateek and Rajan, Ronnie and Pimentel, Marco AF and Kanithi, Praveenkumar},
booktitle={Medical Imaging with Deep Learning},
year={2025}
}
- Downloads last month
- 16
Model tree for m42-health/CXformer-small
Base model
facebook/dinov2-with-registers-base