Multi-modal Pathological Pre-training via Masked Autoencoders for Breast Cancer Diagnosis

Lu, Mengkang; Wang, Tianyi; Xia, Yong

doi:10.1007/978-3-031-43987-2_44

Mengkang Lu¹⁴,
Tianyi Wang¹⁴ &
Yong Xia¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14225))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

4348 Accesses

Abstract

Breast cancer (BC) is one of the most common cancers identified globally among women, which has become the leading cause of death. Multi-modal pathological images contain different information for BC diagnosis. Hematoxylin and eosin (H &E) staining images could reveal a considerable amount of microscopic anatomy. Immunohistochemical (IHC) staining images provide the evaluation of the expression of various biomarkers, such as the human epidermal growth factor receptor (HER2) hybridization. In this paper, we propose a multi-modal pre-training model via pathological images for BC diagnosis. The proposed pre-training model contains three modules: (1) the modal-fusion encoder, (2) the mixed attention, and (3) the modal-specific decoders. The pre-trained model could be performed on multiple relevant tasks (IHC Reconstruction and IHC classification). The experiments on two datasets (HEROHE Challenge and BCI Challenge) show state-of-the-art results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aresta, G., et al.: Bach: grand challenge on breast cancer histology images. Med. Image Anal. 56, 122–139 (2019)
Article Google Scholar
Bachmann, R., Mizrahi, D., Atanov, A., Zamir, A.: MultiMAE: multi-modal multi-task masked autoencoders. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol. 13697, pp. 348–367. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19836-6_20
Baevski, A., Babu, A., Hsu, W.N., Auli, M.: Efficient self-supervised learning with contextualized target representations for vision, speech and language. arXiv preprint arXiv:2212.07525 (2022)
Bejnordi, B.E., et al.: Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318(22), 2199–2210 (2017)
Article Google Scholar
Chen, F.L., et al.: VLP: a survey on vision-language pre-training. Mach. Intell. Res. 20(1), 38–56 (2023)
Article Google Scholar
Chen, R.J., et al.: Pathomic fusion: an integrated framework for fusing histopathology and genomic features for cancer diagnosis and prognosis. IEEE Trans. Med. Imaging 41(4), 757–770 (2020)
Article Google Scholar
Chen, R.J., et al.: Multimodal co-attention transformer for survival prediction in gigapixel whole slide images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4015–4025 (2021)
Google Scholar
Chen, Z., et al.: Multi-modal masked autoencoders for medical vision-and-language pre-training. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. MICCAI 2022. Lecture Notes in Computer Science, vol. 13435, pp. 679–689. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16443-9_65
Conde-Sousa, E., et al.: HEROHE challenge: predicting HER2 status in breast cancer from hematoxylin-eosin whole-slide imaging. J. Imaging 8(8), 213 (2022)
Article Google Scholar
DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 224–236 (2018)
Google Scholar
Do, T., Nguyen, B.X., Tjiputra, E., Tran, M., Tran, Q.D., Nguyen, A.: Multiple meta-model quantifying for medical visual question answering. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12905, pp. 64–74. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87240-3_7
Chapter Google Scholar
Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Article MathSciNet Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)
Google Scholar
Liu, S., Zhu, C., Xu, F., Jia, X., Shi, Z., Jin, M.: BCI: breast cancer immunohistochemical image generation through pyramid pix2pix. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1815–1824 (2022)
Google Scholar
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157. IEEE (1999)
Google Scholar
Lu, M.Y., Williamson, D.F., Chen, T.Y., Chen, R.J., Barbieri, M., Mahmood, F.: Data-efficient and weakly supervised computational pathology on whole-slide images. Nature Biomed. Eng. 5(6), 555–570 (2021)
Article Google Scholar
Mobadersany, P., et al.: Predicting cancer outcomes from histology and genomics using convolutional networks. Proc. Natl. Acad. Sci. 115(13), E2970–E2979 (2018)
Article Google Scholar
Nakhli, R., et al.: Amigo: sparse multi-modal graph transformer with shared-context processing for representation learning of giga-pixel images. arXiv preprint arXiv:2303.00865 (2023)
Onitilo, A.A., Engel, J.M., Greenlee, R.T., Mukesh, B.N.: Breast cancer subtypes based on ER/PR and HER2 expression: comparison of clinicopathologic features and survival. Clin. Med. Res. 7(1–2), 4–13 (2009)
Article Google Scholar
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)
Article Google Scholar
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., et al.: PyTorch: an imperative style, high-performance deep learning library. Adv. Neural. Inf. Process. Syst. 32, 8026–8037 (2019)
Google Scholar
Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: Superglue: learning feature matching with graph neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4938–4947 (2020)
Google Scholar
Sung, H., et al.: Global cancer statistics 2020: globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 71(3), 209–249 (2021)
Article Google Scholar
Weitz, P., Valkonen, M., Solorzano, L., Hartman, J., Ruusuvuori, P., Rantalainen, M.: ACROBAT-automatic registration of breast cancer tissue. In: 10th Internatioal Workshop on Biomedical Image Registration (2022)
Google Scholar
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)
Google Scholar

Download references

Acknowledgment

This work was supported in part by the Key Research and Development Program of Shaanxi Province, China, under Grant 2022GY-084, in part by the National Natural Science Foundation of China under Grant 62171377, and in part by the Key Technologies Research and Development Program under Grant 2022YFC2009903/2022YFC2009900.

Author information

Authors and Affiliations

National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, School of Computer Science and Engineering, Northwestern Polytechnical University, Xi’an, 710072, China
Mengkang Lu, Tianyi Wang & Yong Xia

Authors

Mengkang Lu
View author publications
You can also search for this author in PubMed Google Scholar
Tianyi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yong Xia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Xia .

Editor information

Editors and Affiliations

Icahn School of Medicine, Mount Sinai, NYC, NY, USA, Tel Aviv University, Tel Aviv, Israel
Hayit Greenspan
Emory University, Atlanta, GA, USA
Anant Madabhushi
Queen’s University, Kingston, ON, Canada
Parvin Mousavi
The University of British Columbia, Vancouver, BC, Canada
Septimiu Salcudean
Yale University, New Haven, CT, USA
James Duncan
IBM Research, San Jose, CA, USA
Tanveer Syeda-Mahmood
Johns Hopkins University, Baltimore, MD, USA
Russell Taylor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lu, M., Wang, T., Xia, Y. (2023). Multi-modal Pathological Pre-training via Masked Autoencoders for Breast Cancer Diagnosis. In: Greenspan, H., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol 14225. Springer, Cham. https://doi.org/10.1007/978-3-031-43987-2_44

Download citation

DOI: https://doi.org/10.1007/978-3-031-43987-2_44
Published: 01 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43986-5
Online ISBN: 978-3-031-43987-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

Multi-modal Pathological Pre-training via Masked Autoencoders for Breast Cancer Diagnosis