LATrans-Unet: Improving CNN-Transformer with Location Adaptive for Medical Image Segmentation

Lin, Qiqin; Yao, Junfeng; Hong, Qingqi; Cao, Xianpeng; Zhou, Rongzhou; Xie, Weixing

doi:10.1007/978-981-99-8558-6_19

Qiqin Lin¹⁵,
Junfeng Yao^15,16,17,
Qingqi Hong^15,17,18,
Xianpeng Cao¹⁵,
Rongzhou Zhou¹⁵ &
…
Weixing Xie¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14437))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

316 Accesses

Abstract

Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have been widely employed in medical image segmentation. While CNNs excel in local feature encoding, their ability to capture long-range dependencies is limited. In contrast, ViTs have strong global modeling capabilities. However, existing attention-based ViT models face difficulties in adaptively preserving accurate location information, rendering them unable to handle variations in important information within medical images. To inherit the merits of CNN and ViT while avoiding their respective limitations, we propose a novel framework called LATrans-Unet. By comprehensively enhancing the representation of information in both shallow and deep levels, LATrans-Unet maximizes the integration of location information and contextual details. In the shallow levels, based on a skip connection called SimAM-skip, we emphasize information boundaries and bridge the encoder-decoder semantic gap. Additionally, to capture organ shape and location variations in medical images, we propose Location-Adaptive Attention in the deep levels. It enables accurate segmentation by guiding the model to track changes globally and adaptively. Extensive experiments on multi-organ and cardiac segmentation tasks validate the superior performance of LATrans-Unet compared to previous state-of-the-art methods. The codes and trained models will be available soon.

The paper is supported by the Natural Science Foundation of China (No. 62072388), the industry guidance project foundation of science technology bureau of Fujian province in 2020 (No. 2020H0047), the Fujian Sunshine Charity Foundation, and ITC-InnoHK.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Wang, R., Lei, T., Cui, R., Zhang, B., Meng, H., Nandi, A.K.: Medical image segmentation using deep learning: a survey. IET Image Proc. 16(5), 1243–1267 (2022)
Article Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: UNet++: a nested U-net architecture for medical image segmentation. In: Stoyanov, D., et al. (eds.) DLMIA/ML-CDS -2018. LNCS, vol. 11045, pp. 3–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00889-5_1
Chapter Google Scholar
Xiao, X., Lian, S., Luo, Z., Li, S.: Weighted res-unet for high-quality retina vessel segmentation. In: 2018 9th International Conference on Information Technology in Medicine and Education (ITME), pp. 327–331. IEEE (2018)
Google Scholar
Li, X., Chen, H., Qi, X., Dou, Q., Fu, C.W., Heng, P.A.: H-DenseUNet: hybrid densely connected UNET for liver and tumor segmentation from CT volumes. IEEE Trans. Med. Imaging 37(12), 2663–2674 (2018)
Article Google Scholar
Alom, M.Z., Hasan, M., Yakopcic, C., Taha, T.M., Asari, V.K.: Recurrent residual convolutional neural network based on u-net (r2u-net) for medical image segmentation. arXiv preprint arXiv:1802.06955 (2018)
Huang, H., et al.: Unet 3+: A full-scale connected UNET for medical image segmentation. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1055–1059. IEEE (2020)
Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)
Google Scholar
Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)
Gu, Z., et al.: CE-NET: context encoder network for 2d medical image segmentation. IEEE Trans. Med. Imaging 38(10), 2281–2292 (2019)
Article Google Scholar
Dai, Y., Gieseke, F., Oehmcke, S., Wu, Y., Barnard, K.: Attentional feature fusion. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3560–3569 (2021)
Google Scholar
Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13713–13722 (2021)
Google Scholar
Zhu, L., Wang, X., Ke, Z., Zhang, W., Lau, R.W.: Biformer: vision transformer with bi-level routing attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10323–10333 (2023)
Google Scholar
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Zheng, S., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6881–6890 (2021)
Google Scholar
Chen, J., et al.: Transunet: transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)
Wang, H., et al.: Mixed transformer u-net for medical image segmentation. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2390–2394. IEEE (2022)
Google Scholar
Cao, H., et al.: Swin-unet: Unet-like pure transformer for medical image segmentation. In: European Conference on Computer Vision, pp. 205–218. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-25066-8_9
Yang, L., Zhang, R.Y., Li, L., Xie, X.: Simam: a simple, parameter-free attention module for convolutional neural networks. In: International Conference on Machine Learning, pp. 11863–11874. PMLR (2021)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)
Google Scholar
Fu, S., Lu, Y., Wang, Y., Zhou, Y., Shen, W., Fishman, E., Yuille, A.: Domain adaptive relational reasoning for 3D multi-organ segmentation. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12261, pp. 656–666. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59710-8_64
Chapter Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D vision (3DV), pp. 565–571. IEEE (2016)
Google Scholar
Oktay, O., et al.: Attention u-net: learning where to look for the pancreas. arXiv preprint arXiv:1804.03999 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Digital Media Computing, School of Film, School of Informatics, Xiamen University, Xiamen, 361005, China
Qiqin Lin, Junfeng Yao, Qingqi Hong, Xianpeng Cao, Rongzhou Zhou & Weixing Xie
Key Laboratory of Digital Protection and Intelligent Processing of Intangible Cultural Heritage of Fujian and Taiwan, Ministry of Culture and Tourism, Xiamen, China
Junfeng Yao
Institute of Artificial Intelligence, Xiamen University, Xiamen, 361005, China
Junfeng Yao & Qingqi Hong
Hong Kong Centre for Cerebro-Cardiovascular Health Engineering (COCHE), Hong Kong, China
Qingqi Hong

Authors

Qiqin Lin
View author publications
You can also search for this author in PubMed Google Scholar
Junfeng Yao
View author publications
You can also search for this author in PubMed Google Scholar
Qingqi Hong
View author publications
You can also search for this author in PubMed Google Scholar
Xianpeng Cao
View author publications
You can also search for this author in PubMed Google Scholar
Rongzhou Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Weixing Xie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Junfeng Yao or Qingqi Hong .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lin, Q., Yao, J., Hong, Q., Cao, X., Zhou, R., Xie, W. (2024). LATrans-Unet: Improving CNN-Transformer with Location Adaptive for Medical Image Segmentation. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14437. Springer, Singapore. https://doi.org/10.1007/978-981-99-8558-6_19

Download citation

DOI: https://doi.org/10.1007/978-981-99-8558-6_19
Published: 26 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8557-9
Online ISBN: 978-981-99-8558-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics