File size: 8,080 Bytes
3001667
 
 
 
665c71b
 
 
 
 
 
 
 
 
6b29b61
 
 
 
3001667
 
6f6b0f7
 
c187270
adc6b15
c187270
6f6b0f7
 
 
 
 
 
 
 
 
 
8f11389
6f6b0f7
 
 
 
 
 
 
 
 
 
 
c187270
 
1494ba0
 
 
 
c187270
cd03d44
1494ba0
 
 
 
 
 
 
 
 
 
 
 
 
 
6f6b0f7
1494ba0
 
f39f378
9d5b424
f39f378
4ce0a43
f39f378
4ce0a43
9d5b424
 
1494ba0
 
 
6f6b0f7
1494ba0
 
 
 
9d5b424
4ce0a43
 
 
 
1494ba0
9d5b424
 
 
6f6b0f7
 
 
6b29b61
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85566f4
 
 
 
 
1b2a72e
adc6b15
85566f4
1b2a72e
85566f4
 
1b2a72e
85566f4
 
 
 
 
6b29b61
6f6b0f7
 
 
 
61b646d
6f6b0f7
61b646d
 
6f6b0f7
 
 
 
 
6b29b61
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
---
license: mit
tags:
- image-feature-extraction
- histology
- pathology
- vision
- pytorch
- self-supervised
- vit
- dino
language:
- en
metrics:
- accuracy
base_model:
- facebook/dinov2-giant
---

# Kaiko midnight
Midnight - Training State-of-the-Art Pathology Foundation Models with Orders of Magnitude Less Data

This repository contains the model checkpoints for the **Midnight-12k** model presented in our paper titled "Training state-of-the-art pathology foundation models with orders of magnitude less data." Our approach achieves competitive performance compared to leading pathology foundation models (FMs), despite being trained on significantly fewer whole slide images (WSIs).

## Overview

We propose a refined self-supervised training framework based on DINOv2 with modifications that optimize model performance specifically for computational pathology. Our main contributions include:

- Three novel pathology FMs trained with significantly reduced data (up to 100x fewer WSIs).
- Introduction of high-resolution post-training to enhance embedding quality.

## Model Highlights

- **Midnight-12k**: Trained exclusively on the publicly available TCGA dataset (12k WSIs).
- **Midnight-92k**: Trained on TCGA and an additional proprietary dataset from the Netherlands Cancer Institute (NKI-80k).
- **Midnight-92k/392**: Our top-performing model fine-tuned with high-resolution post-training.


## Model Weights
- Midnight-12k: [Publicly available](https://huggingface.co/kaiko-ai/midnight/tree/main) under the permissive MIT license.
- Midnight-92k & Midnight-92k/392: Trained on proprietary datasets and subject to restricted access.


## Usage

Our models are trained on 224x224 images normalized with a mean of (0.5, 0.5, 0.5) and a standard deviation of (0.5, 0.5, 0.5). Please ensure you apply these exact normalization parameters when preparing your datasets for embedding extraction.

```python
from transformers import AutoImageProcessor, AutoModel
from PIL import Image
import requests
from torchvision.transforms import v2

url = 'https://upload.wikimedia.org/wikipedia/commons/8/80/Breast_DCIS_histopathology_%281%29.jpg'
image = Image.open(requests.get(url, stream=True).raw)

transform = v2.Compose(
    [
        v2.Resize(224),
        v2.CenterCrop(224),
        v2.ToTensor(),
        v2.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5)),
    ]
)
model = AutoModel.from_pretrained('kaiko-ai/midnight')
```

### Extract embeddings for classification
For segmentation tasks, the model output corresponds to 16x16 patch tokens (derived from 224/14=16).
```python
import torch

def extract_classification_embedding(tensor):
    cls_embedding, patch_embeddings = tensor[:, 0, :], tensor[:, 1:, :]
    return torch.cat([cls_embedding, patch_embeddings.mean(1)], dim=-1)

batch = transform(image).unsqueeze(dim=0)
embedding = extract_classification_embedding(model(batch).last_hidden_state)
print(f"Embedding shape: {embedding[0].shape}")
```

### Extract embeddings for segmentation

```python
import math
import torch

def extract_segmentation_embedding(tensor):
    features = tensor[:, 1:, :].permute(0, 2, 1)
    batch_size, hidden_size, patch_grid = features.shape
    height = width = int(math.sqrt(patch_grid))
    return features.view(batch_size, hidden_size, height, width)

batch = transform(image).unsqueeze(dim=0)
embedding = extract_segmentation_embedding(model(batch).last_hidden_state)
print(f"Embedding shape: {embedding[0].shape}")
```


## Training Datasets

| Dataset | WSIs | Source        | Comment    | 
|---------|------|---------------|------------|
| TCGA    | 12k  | Public        | FFPE only  |
| NKI-80k | 80k  | Proprietary   | 10,141 patients, 31 organs |

## Training Components 

- **DINOv2**: Self-supervised training with [DINOv2](https://github.com/facebookresearch/dinov2).
- **KDE regularizer**: Replaced KoLeo in DINOv2 to ensure embedding diversity and training stability.
- **Online patching**: Efficient real-time extraction of informative tiles.
- **Color augmentation (HED)**: Robustness to stain variations.
- **Tile filtering**: Removal of low-informative tissue regions.

## Evaluation

We comprehensively evaluated the models using two sets of open-source benchmarks:

- [eva](https://github.com/kaiko-ai/eva): For both tile (classification, segmentation) and slide-level tasks.
- [HEST](https://github.com/mahmoodlab/HEST): For gene expression prediction tasks (regression).

Our best model **Midnight-92k/392** consistently outperforms or matches leading models like Virchow2 and UNI-2.

## Results Summary

| Model | AVG. | PCam 10 shots | BACH | BRACS | BreaKHis | CRC | Gleason | MHIST | PCam | Cam16 (small) | Panda (small) | CoNSeP | MoNuSAC | HEST |
|-------|------|---------------|------|-------|----------|-----|---------|-------|------|---------------|---------------|--------|---------|------|
| **[Midnight-92k/392](#usage)** | **0.778** | **0.900** | **0.904** | **0.646** | 0.802 | 0.966 | **0.807** | 0.828 | **0.951** | 0.868 | 0.651 | **0.662** | **0.708** | 0.415 |
| [UNI-2](https://huggingface.co/MahmoodLab/UNI2-h) | **0.776** | **0.885** | **0.924** | **0.651** | **0.863** | **0.970** | 0.777 | 0.829 | **0.951** | **0.873** | **0.666** | 0.626 | 0.644 | **0.431** |
| **[Midnight-92k](#usage)** | **0.767** | **0.882** | 0.889 | 0.615 | 0.793 | **0.967** | **0.823** | 0.831 | 0.948 | **0.872** | 0.643 | 0.629 | 0.656 | **0.425** |
| [Virchow2](https://huggingface.co/paige-ai/Virchow2) | 0.766 | 0.835 | 0.890 | 0.633 | 0.818 | 0.966 | **0.791** | **0.865** | 0.938 | 0.860 | 0.646 | 0.640 | 0.674 | 0.403 |
| [**Midnight-12k**](#usage) | 0.763 | 0.803 | **0.907** | 0.639 | 0.840 | **0.967** | 0.790 | 0.815 | 0.931 | **0.869** | 0.656 | 0.625 | 0.664 | 0.412 |
| [Kaiko-B8](https://github.com/kaiko-ai/towards_large_pathology_fms) | 0.757 | 0.799 | 0.876 | 0.641 | **0.842** | 0.960 | 0.761 | 0.830 | 0.920 | 0.836 | 0.650 | **0.644** | 0.686 | 0.391 |
| [H-Optimus-0](https://huggingface.co/bioptimus/H-optimus-0) | 0.755 | 0.831 | 0.752 | 0.620 | 0.813 | 0.962 | 0.769 | **0.850** | 0.943 | 0.847 | **0.672** | **0.644** | **0.687** | **0.425** |
| [Prov_GigaPath](https://github.com/prov-gigapath/prov-gigapath) | 0.752 | 0.853 | 0.794 | 0.626 | **0.846** | 0.959 | 0.727 | 0.831 | 0.944 | 0.812 | 0.657 | 0.628 | **0.688** | 0.405 |
| [Hibou-L](https://huggingface.co/histai/hibou-L) | 0.751 | 0.825 | 0.792 | **0.643** | 0.767 | 0.954 | 0.766 | **0.850** | **0.949** | 0.852 | 0.654 | **0.646** | 0.668 | 0.397 |
| [UNI](https://huggingface.co/MahmoodLab/UNI) | 0.749 | 0.833 | 0.797 | 0.613 | 0.808 | 0.954 | 0.759 | 0.841 | 0.937 | 0.854 | **0.662** | 0.627 | 0.662 | 0.391 |
| [Phikon](https://huggingface.co/owkin/phikon) | 0.724 | 0.826 | 0.744 | 0.579 | 0.715 | 0.946 | 0.743 | 0.824 | 0.919 | 0.822 | 0.648 | 0.624 | 0.644 | 0.377 |
| [Phikon-v2](https://huggingface.co/owkin/phikon-v2) | 0.718 | 0.756 | 0.737 | 0.607 | 0.725 | 0.953 | 0.753 | 0.796 | 0.900 | 0.807 | 0.634 | 0.626 | 0.645 | 0.391 |
| [Lunit](https://github.com/lunit-io/benchmark-ssl-pathology) | 0.714 | 0.763 | 0.785 | 0.627 | 0.759 | 0.943 | 0.758 | 0.785 | 0.905 | 0.759 | 0.604 | 0.600 | 0.630 | 0.362 |
| [vitg14 (nat. img.)](https://github.com/facebookresearch/dinov2) | 0.674 | 0.721 | 0.724 | 0.578 | 0.783 | 0.943 | 0.740 | **0.855** | 0.881 | 0.500 | 0.509 | 0.565 | 0.614 | 0.351 |
| [vitg14 (initial)](https://github.com/facebookresearch/dinov2) | 0.493 | 0.652 | 0.474 | 0.413 | 0.425 | 0.754 | 0.459 | 0.578 | 0.763 | 0.526 | 0.304 | 0.462 | 0.432 | 0.166 |

 ## Citation
 ```bibtex
 @article{KDK2025,
   title={Training state-of-the-art pathology foundation models with orders of magnitude less data},
   author={Mikhail Karasikov and Joost van Doorn and Nicolas Känzig and Melis Erdal Cesur and Hugo Mark Horlings and Robert Berke and Fei Tang and Sebastian Otálora},
   year={2025},
   journal={arXiv preprint arXiv:2504.05186},
   url={https://arxiv.org/abs/2504.05186},
}
```

<br />

![](https://github.com/kaiko-ai/midnight/blob/main/docs/images/kaiko-logo.png?raw=true)