EAGLE

The use of artificial intelligence (AI) models to develop computational biomarkers from H&E-stained digital histopathology images has emerged as a promising diagnostic approach for enhancing clinical management for cancer patients. Computational biomarkers offer several advantages: 1) they are digitally deployed, 2) cost-effective, and 3) do not consume tissue. Despite numerous promising models in the literature, their clinical utility in real-world settings has yet to be established. Assessment for EGFR mutations in lung adenocarcinoma is challenged by a need for rapid, accurate results at a low cost while preserving tissue for comprehensive genomic sequencing. Polymerase chain reaction (PCR)-based assays are used to provide rapid results but are less accurate than genomic sequencing and deplete the tissue. Highly accurate and robust computational biomarkers, aided by use of modern foundation models, can fill such a niche. We compiled the largest, international, multi-institutional clinical cohort of digital histopathology images of lung adenocarcinomas (N=8461 cases/slides) to develop and validate a state-of-the-art computational EGFR biomarker. The model utilizes an open source foundation model that is fine-tuned for the task of EGFR classification. We demonstrate that fine-tuning the foundation model results in improved task-specific performance that generalizes across institutions and scanning protocols with clinical-level performance (mean AUC: internal 0.847, external 0.870). To realize the translation into the clinic as well as investigate its in-real-time (IRT) usability, we conducted the first-of-its-kind prospective silent trial of a computational biomarker on primary samples, achieving an AUC of 0.896. We demonstrate that an AI assisted rapid EGFR screening workflow reduces the amount of rapid testing needed by up to 43% while maintaining clinical standard performance. The retrospective and prospective results demonstrate for the first time the clinical utility and efficacy of an H&E-based computational biomarker in a real-world clinical setting.

Model

The model consists of: 1) a 1.1 billion parameter vision transformer (ViT-g) that encodes high-resolution (20x magnification, 0.5 microns per pixel) 224-pixel patches into a 1,536 feature vector; 2) a gated MIL attention (GMA) aggregator that integrates all encoded patches from a slide into a global slide-level feature representation; and 3) a linear classifier that outputs the probability of an EGFR mutation based on the input slide data. The tile encoder was initialized with GigaPath. The model was trained end-to-end for the task of predicting EGFR mutational status from H&E slides.

Model Usage

To get started, first clone the repository with this command:

  git clone --no-checkout https://huggingface.co/MCCPBR/EAGLE && cd EAGLE

Now you can use the following code:

from PIL import Image
import numpy as np
import eagle
import torch
import torchvision.transforms as transforms

# Load model
model = eagle.EAGLE()

# Set up transform
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))
])

# Image
img = np.random.randint(0, 256, size=224*224*3).reshape(224,224,3).astype(np.uint8)
img = Image.fromarray(img)
img = transform(img).unsqueeze(0)

# Inference
with torch.no_grad():
    h, att, p = model(img)