Abstract
Learning informative representations is crucial for classification and prediction tasks on histopathological images. Due to the huge image size, whole-slide histopathological image analysis is normally addressed with multi-instance learning (MIL) scheme. However, the weakly supervised nature of MIL leads to the challenge of learning an effective whole-slide-level representation. To tackle this issue, we present a novel embedded-space MIL model based on deformable transformer (DT) architecture and convolutional layers, which is termed DT-MIL. The DT architecture enables our MIL model to update each instance feature by globally aggregating instance features in a bag simultaneously and encoding the position context information of instances during bag representation learning. Compared with other state-of-the-art MIL models, our model has the following advantages: (1) generating the bag representation in a fully trainable way, (2) representing the bag with a high-level and nonlinear combination of all instances instead of fixed pooling-based methods (e.g. max pooling and average pooling) or simply attention-based linear aggregation, and (3) encoding the position relationship and context information during bag embedding phase. Besides our proposed DT-MIL, we also develop other possible transformer-based MILs for comparison. Extensive experiments show that our DT-MIL outperforms the state-of-the-art methods and other transformer-based MIL architectures in histopathological image classification and prediction tasks. An open-source implementation of our approach can be found at https://github.com/yfzon/DT-MIL.
H. Li, F. Yang and Y. Zhao—Contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amores, J.: Multiple instance classification: review, taxonomy and comparative study. Artif. Intell. 201, 81–105 (2013)
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)
Campanella, G., et al.: Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25(8), 1301–1309 (2019)
Cao, R., et al.: Development and interpretation of a pathomics-based model for the prediction of microsatellite instability in colorectal cancer. Theranostics 10(24), 11080 (2020)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Cheplygina, V., Tax, D.M., Loog, M.: Multiple instance learning with bag dissimilarities. Pattern Recognit. 48(1), 264–275 (2015)
Clark, K., et al.: The cancer imaging archive (TCIA): maintaining and operating a public information repository. J. Digit. Imaging 26(6), 1045–1057 (2013)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Garrett, W.S.: Cancer and the microbiota. Science 348(6230), 80–86 (2015)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256. JMLR Workshop and Conference Proceedings (2010)
Ilse, M., Tomczak, J., Welling, M.: Attention-based deep multiple instance learning. In: International Conference on Machine Learning, pp. 2127–2136. PMLR (2018)
Kandemir, M., Hamprecht, F.A.: Computer-aided diagnosis from weak supervision: a benchmarking study. Comput. Med. Imaging Graph. 42, 44–50 (2015)
Kather, J.N., et al.: Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat. Med. 25(7), 1054–1056 (2019)
Kraus, O.Z., Ba, J.L., Frey, B.J.: Classifying and segmenting microscopy images with deep multiple instance learning. Bioinformatics 32(12), i52–i59 (2016)
Lee, H., Battle, A., Raina, R., Ng, A.Y.: Efficient sparse coding algorithms. In: Advances in Neural Information Processing Systems, pp. 801–808 (2007)
Li, R., Yao, J., Zhu, X., Li, Y., Huang, J.: Graph CNN for survival analysis on whole slide pathological images. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11071, pp. 174–182. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00934-2_20
Mehta, S., Mercan, E., Bartlett, J., Weaver, D., Elmore, J.G., Shapiro, L.: Y-Net: Joint segmentation and classification for diagnosis of breast biopsy images. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11071, pp. 893–901. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00934-2_99
Ostu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)
Paszke, A., et al.: Automatic differentiation in pytorch (2017)
Shi, X., Xing, F., Xie, Y., Zhang, Z., Cui, L., Yang, L.: Loss-based attention for deep multiple instance learning. In: AAAI, vol. 34, pp. 5742–5749 (2020)
Srinidhi, C.L., Ciga, O., Martel, A.L.: Deep neural network models for computational histopathology: a survey. Med. Image Anal., 101813 (2020)
Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR (2019)
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) NIPS, vol. 30. Curran Associates, Inc. (2017)
Wang, T., et al.: Microsatellite instability prediction of uterine corpus endometrial carcinoma based on h&e histology whole-slide imaging. In: 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), pp. 1289–1292. IEEE (2020)
Wang, X., Yan, Y., Tang, P., Bai, X., Liu, W.: Revisiting multiple instance neural networks. Pattern Recognit. 74, 15–24 (2018)
Yan, Y., Wang, X., Guo, X., Fang, J., Liu, W., Huang, J.: Deep multi-instance learning with dynamic pooling. In: Asian Conference on Machine Learning, pp. 662–677 (2018)
Yao, J., Zhu, X., Huang, J.: Deep multi-instance learning for survival prediction from whole slide images. In: Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P.-T., Khan, A. (eds.) MICCAI 2019. LNCS, vol. 11764, pp. 496–504. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32239-7_55
Zhao, Y., et al.: Predicting lymph node metastasis using histopathological images based on multiple instance learning with deep graph convolution. In: CVPR, pp. 4837–4846 (2020)
Zhou, Y., Onder, O.F., Dou, Q., Tsougenis, E., Chen, H., Heng, P.-A.: CIA-Net: robust nuclei instance segmentation with contour-aware information aggregation. In: Chung, A.C.S., Gee, J.C., Yushkevich, P.A., Bao, S. (eds.) IPMI 2019. LNCS, vol. 11492, pp. 682–693. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20351-1_53
Zhou, Y., Sun, X., Liu, D., Zha, Z., Zeng, W.: Adaptive pooling in multi-instance learning for web video annotation. In: ICCV, pp. 318–327 (2017)
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. In: ICLR (2021)
Acknowledgements
This work was partially funded by National Key R&D Program of China (2018YFC2000702).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, H. et al. (2021). DT-MIL: Deformable Transformer for Multi-instance Learning on Histopathological Image. In: de Bruijne, M., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2021. MICCAI 2021. Lecture Notes in Computer Science(), vol 12908. Springer, Cham. https://doi.org/10.1007/978-3-030-87237-3_20
Download citation
DOI: https://doi.org/10.1007/978-3-030-87237-3_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87236-6
Online ISBN: 978-3-030-87237-3
eBook Packages: Computer ScienceComputer Science (R0)