PAT-Unet: Paired Attention Transformer for Efficient and Accurate Segmentation of 3D Medical Images

Zou, Qingzhi; Zhao, Jing; Li, Ming; Yuan, Lin

doi:10.1007/978-981-99-8558-6_30

Qingzhi Zou^15,16,17,
Jing Zhao^15,16,17,
Ming Li¹⁸ &
…
Lin Yuan^15,16,17

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14437))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

754 Accesses

Abstract

Due to the remarkable performance of Transformers in 2D medical image segmentation, recent studies have incorporated them into 3D medical segmentation tasks. Compared to convolution operations in CNNs, Transformer-based models possess self-attention, allowing them to capture long-range dependencies among pixels. To address the high computational cost of the Transformer architecture when dealing with volumetric images containing a large number of slices, we propose an efficient hybrid CNN-Transformer architecture for 3D medical image segmentation named PAT-Unet. Firstly, our proposed Paired Attention Transformer (PAT) blocks effectively reduce spatial dimensions while proficiently learning channel and spatial information in 3D feature maps. This leads to improved segmentation performance by reducing parameter count and accelerating computation speed. Secondly, our Deformable Enhanced Skip Connection (DESC) module captures detailed features in irregular lesion areas by learning volume spatial offsets. Finally, we experimentally validate the effectiveness and efficiency of our model on the Synapse and ACDC benchmark datasets. On the Synapse dataset, our model achieves a Dice similarity score of 87.17%, reducing parameters and FLOPs by 67% compared to the best existing methods reported in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alom, M.Z., Yakopcic, C., Hasan, M., Taha, T.M., Asari, V.K.: Recurrent residual U-Net for medical image segmentation. J. Med. Imaging 6(1), 014006 (2019)
Article Google Scholar
Azad, R., et al.: Medical image segmentation review: the success of U-Net. arXiv preprint arXiv:2211.14830 (2022)
Bernard, O., et al.: Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE Trans. Med. Imaging 37(11), 2514–2525 (2018)
Article Google Scholar
Cao, H., et al.: Swin-Unet: Unet-like pure transformer for medical image segmentation. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds.) ECCV 2022, Part III. LNCS, vol. 13803, pp. 205–218. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-25066-8_9
Chapter Google Scholar
Chen, J., et al.: TransUnet: transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)
Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3D U-Net: learning dense volumetric segmentation from sparse annotation. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016, Part II. LNCS, vol. 9901, pp. 424–432. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46723-8_49
Chapter Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Hatamizadeh, A., et al.: UNETR: transformers for 3d medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 574–584 (2022)
Google Scholar
Huang, H., et al.: Unet 3+: a full-scale connected Unet for medical image segmentation. In: ICASSP, pp. 1055–1059. IEEE (2020)
Google Scholar
Huang, X., Deng, Z., Li, D., Yuan, X.: MISSFormer: an effective medical image segmentation transformer. CoRR abs/2109.07162 (2021)
Google Scholar
Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: NNU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18(2), 203–211 (2021)
Article Google Scholar
Landman, B., Xu, Z., Igelsias, J., Styner, M., Langerak, T., Klein, A.: Miccai multi-atlas labeling beyond the cranial vault-workshop and challenge. In: Proceedings of the MICCAI Multi-Atlas Labeling Beyond Cranial Vault-Workshop Challenge, vol. 5, p. 12 (2015)
Google Scholar
Lee, H.H., Bao, S., Huo, Y., Landman, B.A.: 3D UX-Net: a large kernel volumetric convnet modernizing hierarchical transformer for medical image segmentation. arXiv preprint arXiv:2209.15076 (2022)
Li, H., Nan, Y., Yang, G.: LKAU-Net: 3D large-kernel attention-based U-Net for automatic MRI brain tumor segmentation. In: Yang, G., Aviles-Rivero, A., Roberts, M., Schönlieb, C.B. (eds.) MIUA 2022. LNCS, vol. 13413, pp. 313–327. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-12053-4_24
Chapter Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015, Part III. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Yang, X., Li, Z., Guo, Y., Zhou, D.: DCU-Net: a deformable convolutional neural network based on cascade U-Net for retinal vessel segmentation. Multimedia Tools Appl. 81(11), 15593–15607 (2022)
Article Google Scholar
Zeng, N., et al.: Factoring 3d convolutions for medical images by depth-wise dependencies-induced adaptive attention. In: 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 883–886. IEEE (2022)
Google Scholar
Zhou, H.Y., Guo, J., Zhang, Y., Yu, L., Wang, L., Yu, Y.: NNFormer: interleaved transformer for volumetric segmentation. arXiv preprint arXiv:2109.03201 (2021)
Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., Liang, J.: Unet++: redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imaging 39(6), 1856–1867 (2019)
Article Google Scholar

Download references

Acknowledgements

This work is supported in part by The Key R &D Program of Shandong Province (2021SFGC0101), The 20 Planned Projects in Jinan (202228120), National Key Research and Development Plan under Grant No. 2019YFB1404700.

Author information

Authors and Affiliations

Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
Qingzhi Zou, Jing Zhao & Lin Yuan
Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
Qingzhi Zou, Jing Zhao & Lin Yuan
Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
Qingzhi Zou, Jing Zhao & Lin Yuan
School of Intelligence and Information Engineering, Shandong University of Traditional Chinese Medicine, Jinan, China
Ming Li

Authors

Qingzhi Zou
View author publications
You can also search for this author in PubMed Google Scholar
Jing Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Ming Li
View author publications
You can also search for this author in PubMed Google Scholar
Lin Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jing Zhao .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zou, Q., Zhao, J., Li, M., Yuan, L. (2024). PAT-Unet: Paired Attention Transformer for Efficient and Accurate Segmentation of 3D Medical Images. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14437. Springer, Singapore. https://doi.org/10.1007/978-981-99-8558-6_30

Download citation

DOI: https://doi.org/10.1007/978-981-99-8558-6_30
Published: 26 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8557-9
Online ISBN: 978-981-99-8558-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

PAT-Unet: Paired Attention Transformer for Efficient and Accurate Segmentation of 3D Medical Images