File size: 4,362 Bytes
4a48fa2 efdc633 4a48fa2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
---
license: gpl-3.0
language:
- en
metrics:
- f1
- accuracy
---
# HCAT-FusionNet: Multimodal Preprocessing and Fusion for Survival and Recurrence Prediction,using variational autoencoders and cross-modal attention for holistic healthcare outcome prediction.
This repository contains preprocessing pipelines and training framework for the **HANCOCK (Head and Neck Cancer Cohort)** dataset used in the **Hancothon25 Challenge** at MICCAI 2025. The challenge focuses on predicting **5-year survival** and **2-year recurrence** using multimodal patient data (clinical, pathological, semantic text, spatial histopathology, temporal blood tests).
Our solution introduces **novel preprocessing, imputation, and fusion strategies** to extract robust 512-dimensional embeddings from heterogeneous modalities, followed by **advanced multi-modal training**.
---
## Features
* **Clinical Data Preprocessing**: Advanced imputation ensemble + VAE-based handling of missing data.
* **Pathological Data Preprocessing**: Probabilistic imputation with graph smoothing and 512-d embeddings.
* **Semantic Text Processing**: ClinicalBERT / TF-IDF + SVD pipelines for histories, reports, and surgery descriptions.
* **Spatial Histopathology Aggregation**: Transformer-based aggregation of patch-level features with spatial awareness.
* **Temporal Blood Data**: Physiology-aware normalization, KNN refinement, and LSTM encoder for sequential signals.
* **Fusion Training**: Multi-modal VAE with attention-based cross-modal imputation, joint latent space learning, and uncertainty quantification.
* **Evaluation**: Binary classification of survival and recurrence, reporting accuracy and F1-score.
---
## Documentation
Detailed explanations of each pipeline are available here:
* [Clinical Preprocessing](https://github.com/Ragu-123/hcat-fusionnet/blob/main/Documentation/clinical.md)
* [Pathological Preprocessing](https://github.com/Ragu-123/hcat-fusionnet/blob/main/Documentation/pathological.md)
* [Semantic Text Processing](https://github.com/Ragu-123/hcat-fusionnet/blob/main/Documentation/semantic.md)
* [Spatial Histopathology](https://github.com/Ragu-123/hcat-fusionnet/blob/main/Documentation/spatial.md)
* [Temporal Blood Data](https://github.com/Ragu-123/hcat-fusionnet/blob/main/Documentation/temporal.md)
* [Training Pipeline](https://github.com/Ragu-123/hcat-fusionnet/blob/main/Documentation/train.md)
---
## Results
From the enhanced training pipeline (`train2.py`), the system achieved:
* **5-year Survival F1-score**: **0.80**
* **2-year Recurrence F1-score**: **0.95**
* **Average F1-score**: **0.875**
*(See [`enhanced_hcat_training_summary.json`](enhanced_hcat_training_summary.json) for full training logs and config.)*
---
## Methods Summary
* **Imputation**: Multi-modal VAE, KNN, PCA, and graph smoothing.
* **Embeddings**: Standardized 512-d representations across modalities.
* **Fusion**: Attention-based cross-modal integration with uncertainty weighting.
* **Classification**: Binary prediction of survival and recurrence with robust evaluation.
---
## Challenge Context
This work addresses the **HANCOCK multimodal dataset** provided for **Hancothon25 (MICCAI 2025)**. The dataset includes **763 patients** with modalities:
* Clinical structured data
* Pathology structured data
* Histopathology WSIs & TMAs
* Tabular blood test data
* Free-text clinical/surgery reports
Our framework is designed for **precision oncology**, enabling predictive modeling for treatment planning and follow-up.
---
## Performance & Insights
* The system demonstrates strong **generalization across modalities**.
* Temporal and pathological modalities improved recurrence prediction.
* Clinical and semantic features boosted survival classification.
* Fusion strategies with uncertainty modeling ensured robustness under missing modalities.
---
## Citation
If you use this repository, please cite:
```
@inproceedings{hcat_fusionnet_2025,
title={HCAT-FusionNet: Multimodal Preprocessing and Fusion for Survival and Recurrence Prediction},
author={Ragunath R, Sanjay S, Harish G},
booktitle={MICCAI Hancothon25 Challenge},
year={2025}
}
```
---
## Contact
For questions or collaborations:
* **Author**: [Harish G](https://github.com/Harish2404lll), [Ragunath R](https://github.com/Ragu-123), [Sanjay S](https://github.com/22002102) |