Skip to main content

Convolution-Free Medical Image Segmentation Using Transformers

  • Conference paper
  • First Online:
Medical Image Computing and Computer Assisted Intervention – MICCAI 2021 (MICCAI 2021)

Abstract

Like other applications in computer vision, medical image segmentation and his email address have been most successfully addressed using deep learning models that rely on the convolution operation as their main building block. Convolutions enjoy important properties such as sparse interactions, weight sharing, and translation equivariance. These properties give convolutional neural networks (CNNs) a strong and useful inductive bias for vision tasks. However, the convolution operation also has important shortcomings: it performs a fixed operation on every test image regardless of the content and it cannot efficiently model long-range interactions. In this work we show that a network based on self-attention between neighboring patches and without any convolution operations can achieve better results. Given a 3D image block, our network divides it into \(n^3\) 3D patches, where \(n=3 \text { or } 5\) and computes a 1D embedding for each patch. The network predicts the segmentation map for the center patch of the block based on the self-attention between these patch embeddings. We show that the proposed model can achieve higher segmentation accuracies than a state of the art CNN. For scenarios with very few labeled images, we propose methods for pre-training the network on large corpora of unlabeled images. Our experiments show that with pre-training the advantage of our proposed network over CNNs can be significant when labeled training data is small.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bai, W., et al.: Semi-supervised learning for network-based cardiac MR image segmentation. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10434, pp. 253–260. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66185-8_29

    Chapter  Google Scholar 

  2. Bai, W., et al.: Recurrent neural networks for aortic image sequence segmentation with sparse annotations. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11073, pp. 586–594. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00937-3_67

    Chapter  Google Scholar 

  3. Bakas, S., et al.: Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the brats challenge. arXiv preprint arXiv:1811.02629 (2018)

  4. Bastiani, M., et al.: Automated processing pipeline for neonatal diffusion MRI in the developing human connectome project. NeuroImage 185, 750–763 (2019)

    Article  Google Scholar 

  5. Bernard, O., et al.: Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE Trans. Med. Imaging 37(11), 2514–2525 (2018)

    Article  Google Scholar 

  6. Chen, J., et al.: Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)

  7. Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)

  8. Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  9. Dou, H., et al.: A deep attentive convolutional neural network for automatic cortical plate segmentation in fetal MRI. arXiv preprint arXiv:2004.12847 (2020)

  10. Gao, Y., Phillips, J.M., Zheng, Y., Min, R., Fletcher, P.T., Gerig, G.: Fully convolutional structured lstm networks for joint 4d medical image segmentation. In: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), pp. 1104–1108 (2018). https://doi.org/10.1109/ISBI.2018.8363764

  11. Gibbs, P., Buckley, D.L., Blackband, S.J., Horsman, A.: Tumour volume determination from MR images by morphological segmentation. Phys. Med. Biol. 41(11), 2437 (1996)

    Article  Google Scholar 

  12. Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1. MIT Press, Cambridge (2016)

    Google Scholar 

  13. Hesamian, M.H., Jia, W., He, X., Kennedy, P.: Deep learning techniques for medical image segmentation: achievements and challenges. J. Digit. Imaging 32(4), 582–596 (2019)

    Article  Google Scholar 

  14. Ho, J., Kalchbrenner, N., Weissenborn, D., Salimans, T.: Axial attention in multidimensional transformers. arXiv preprint arXiv:1912.12180 (2019)

  15. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  16. Isensee, F., Kickingereder, P., Wick, W., Bendszus, M., Maier-Hein, K.H.: No new-net. In: Crimi, A., Bakas, S., Kuijf, H., Keyvan, F., Reyes, M., van Walsum, T. (eds.) BrainLes 2018. LNCS, vol. 11384, pp. 234–244. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11726-9_21

    Chapter  Google Scholar 

  17. Kamnitsas, K., et al.: Efficient multi-scale 3d CNN with fully connected CRF for accurate brain lesion segmentation. Med. Image Anal. 36, 61–78 (2017)

    Article  Google Scholar 

  18. Khan, S., Naseer, M., Hayat, M., Zamir, S.W., Khan, F.S., Shah, M.: Transformers in vision: A survey. arXiv preprint arXiv:2101.01169 (2021)

  19. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations (ICLR) (2014)

    Google Scholar 

  20. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  21. Le Cun, Y., et al.: Handwritten digit recognition with a back-propagation network. In: Proceedings of the 2nd International Conference on Neural Information Processing Systems, pp. 396–404 (1989)

    Google Scholar 

  22. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)

    Article  Google Scholar 

  23. Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 3D Vision (3DV), 2016 Fourth International Conference on, pp. 565–571. IEEE (2016)

    Google Scholar 

  24. Murphy, K.P.: Machine learning: a probabilistic perspective (2012)

    Google Scholar 

  25. Olshausen, B.A., Field, D.J.: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381(6583), 607–609 (1996)

    Article  Google Scholar 

  26. Prince, J.L., Pham, D., Tan, Q.: Optimization of MR pulse sequences for bayesian image segmentation. Med. Phys. 22(10), 1651–1656 (1995)

    Article  Google Scholar 

  27. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)

    Google Scholar 

  28. Taghanaki, S.A., Abhishek, K., Cohen, J.P., Cohen-Adad, J., Hamarneh, G.: Deep semantic segmentation of natural and medical images: a review. Artif. Intell. Rev. 1–42 (2020)

    Google Scholar 

  29. Thompson, P.M., Toga, A.W.: Detection, visualization and animation of abnormal anatomic structure with a deformable probabilistic brain atlas based on random vector field transformations. Med. Image Anal. 1(4), 271–294 (1997)

    Article  Google Scholar 

  30. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. arXiv preprint arXiv:2012.12877 (2020)

  31. Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)

  32. Wang, H., Zhu, Y., Green, B., Adam, H., Yuille, A., Chen, L.-C.: Axial-DeepLab: stand-alone axial-attention for panoptic segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 108–126. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_7

    Chapter  Google Scholar 

  33. Wang, Y., Guo, Q., Zhu, Y.: Medical image segmentation based on deformable models and its applications. In: Deformable Models. Topics in Biomedical Engineering. International Book Series, pp. 209–260. Springer, New York (2007). https://doi.org/10.1007/978-0-387-68343-0_7

  34. Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: UNet++: a nested U-net architecture for medical image segmentation. In: Stoyanov, D., et al. (eds.) DLMIA/ML-CDS -2018. LNCS, vol. 11045, pp. 3–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00889-5_1

    Chapter  Google Scholar 

Download references

Acknowledgement

This work was supported in part by the National Institutes of Health (NIH) award numbers R01NS106030, R01EB018988, and R01EB031849; by the Office of the Director of the NIH under number S10OD0250111; and by a Technological Innovations in Neuroscience Award from the McKnight Foundation. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH or the McKnight Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Davood Karimi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Karimi, D., Vasylechko, S.D., Gholipour, A. (2021). Convolution-Free Medical Image Segmentation Using Transformers. In: de Bruijne, M., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2021. MICCAI 2021. Lecture Notes in Computer Science(), vol 12901. Springer, Cham. https://doi.org/10.1007/978-3-030-87193-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-87193-2_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-87192-5

  • Online ISBN: 978-3-030-87193-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics