STA-Former: enhancing medical image segmentation with Shrinkage Triplet Attention in a hybrid CNN-Transformer model

Liu, Yuzhao; Han, Liming; Yao, Bin; Li, Qing

doi:10.1007/s11760-023-02893-5

STA-Former: enhancing medical image segmentation with Shrinkage Triplet Attention in a hybrid CNN-Transformer model

Original Paper
Published: 13 December 2023

Volume 18, pages 1901–1910, (2024)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Yuzhao Liu^1,2,
Liming Han^1,2,
Bin Yao¹ &
…
Qing Li^1,2

647 Accesses
2 Citations
Explore all metrics

Abstract

Convolutional neural networks (CNNs) have found extensive use in medical image segmentation tasks. However, they encounter limitations in capturing long-range semantic interactions. Conversely, Transformers excel at handling long-range dependencies but struggle to preserve local semantic details. To address this challenge, we propose STA-Former, a hybrid CNN-Transformer model for medical image segmentation. Our approach is founded on three fundamental principles: (1) We propose the Shrinkage Triplet Attention (STA) module to enhance feature fusion within the decoder. It focuses on spatial and channel interactions in the feature map, computes thresholds across dimensions, and suppresses irrelevant information through soft-thresholding. (2) We present a redesigned hierarchical hybrid CNN-Transformer encoder that connects CNN and Transformer blocks at multiple scales, enabling the capture of both long-range and short-range dependencies across various scales of feature maps. (3) Unlike traditional decoders that apply the attention mechanism exclusively to low-level features, our approach utilizes a multiscale attention hierarchical decoder, leveraging feature map correlations at different scales for effective feature fusion. Our method exhibits superior performance compared to the state-of-the-art methods on three datasets: Synapse multiorgan CT, ACDC cardiac MRI scans, and breast ultrasound image.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RTNet: a residual t-shaped network for medical image segmentation

Article 14 February 2024

msFormer: Adaptive Multi-Modality 3D Transformer for Medical Image Segmentation

Dual-attention transformer-based hybrid network for multi-modal medical image segmentation

Article Open access 28 October 2024

Data availability

The authors confirm that the data supporting the findings of this study are available within the article and openly available in [Synapse], [ACDC], and [BUSI].

References

Mkindu, H., Wu, L., Zhao, Y.: 3d multi-scale vision transformer for lung nodule detection in chest CT images. Signal Image Video Process. 17, 2473–2480 (2023)
Article Google Scholar
Pandit, B.K., Banerjee, A.: 3d edgesegnet: a deep neural network framework for simultaneous edge detection and segmentation of medical images. Signal Image Video Process. 17, 2981–2989 (2023)
Article Google Scholar
Upreti, M., Pandey, C., Bist, A.S., Rawat, B., Hardini, M.: Convolutional neural networks in medical image understanding. Aptisi Trans. Technopreneurship (ATT) 3(2), 120–126 (2021)
Article Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp. 234–241, Springer (2015)
Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., Lu, L., Yuille, A.L., Zhou, Y.: Transunet: transformers make strong encoders for medical image segmentation. arXiv:2102.04306 (2021)
Azad, R., Fayjie, A.R., Kauffmann, C., Ben Ayed, I., Pedersoli, M., Dolz, J.: On the texture bias for few-shot CNN segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2674–2683 (2021)
Valanarasu, J.M.J., Oza, P., Hacihaliloglu, I., Patel, V.M.: Medical transformer: gated axial-attention for medical image segmentation. In: Medical Image Computing and Computer Assisted Intervention—MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24, pp. 36–46, Springer (2021)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pp. 213–229, Springer (2020)
Azad, R., Heidari, M., Shariatnia, M., Aghdam, E.K., Karimijafarbigloo, S., Adeli, E., Merhof, D.: Transdeeplab: convolution-free transformer-based deeplab v3+ for medical image segmentation. In: Predictive Intelligence in Medicine: 5th International Workshop, PRIME 2022, Held in Conjunction with MICCAI 2022, Singapore, September 22, 2022, Proceedings, pp. 91–102, Springer (2022)
Peng, Z., Huang, W., Gu, S., Xie, L., Wang, Y., Jiao, J., Ye, Q.: Conformer: local features coupling global representations for visual recognition. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 367–376 (2021)
Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: ‘Swin-unet: unet-like pure transformer for medical image segmentation. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part III, pp. 205–218, Springer (2023)
Wang, B., Wang, F., Dong, P., Li, C.: Multiscale Transunet++: dense hybrid u-net with transformer for medical image segmentation. Signal Image Video Process. 16(6), 1607–1614 (2022)
Article Google Scholar
Zhang, Y., Qian, K., Zhu, Z., Yu, H., Zhang, B.: Dba-unet: a double u-shaped boundary attention network for maxillary sinus anatomical structure segmentation in cbct images. Signal Image Video Process. 17(5), 2251–2257 (2023)
Article Google Scholar
Liang, B., Tang, C., Zhang, W., Xu, M., Wu, T.: N-net: an Unet architecture with dual encoder for medical image segmentation. Signal Image Video Process. 17, 3073–3081 (2023)
Article Google Scholar
Ruan, J., Xie, M., Xiang, S., Liu, T., Fu, Y.: Mew-unet: multi-axis representation learning in frequency domain for medical image segmentation. arXiv:2210.14007 (2022)
Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: Unet++: a nested u-net architecture for medical image segmentation. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4, pp. 3–11, Springer (2018)
Chen, H., Han, Y., Xu, P., Li, Y., Li, K., Yin, J.: Ms-unet-v2: adaptive denoising method and training strategy for medical image segmentation with small training data. arXiv:2309.03686 (2023)
Wu, H., Xiao, B., Codella, N., Liu, M., Dai, X., Yuan, L., Zhang, L.: Cvt: introducing convolutions to vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 22–31 (2021)
Xu, G., Wu, X., Zhang, X., He, X.: Levit-unet: make faster encoders with transformer for medical image segmentation. arXiv:2107.08623 (2021)
Misra, D., Nalamada, T., Arasanipalai, A.U., Hou, Q.: Rotate to attend: Convolutional triplet attention module. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3139–3148, (2021)
Lin, Y., Zhang, D., Fang, X., Chen, Y., Cheng, K.-T., Chen, H.: Rethinking boundary detection in deep learning models for medical image segmentation. In: International Conference on Information Processing in Medical Imaging, pp. 730–742, Springer (2023)
Wang, H., Xie, S., Lin, L., Iwamoto, Y., Han, X.-H., Chen, Y.-W., Tong, R.: Mixed transformer u-net for medical image segmentation. In: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2390–2394. IEEE (2022)
Guo, M.-H., Liu, Z.-N., Mu, T.-J., Hu, S.-M.: Beyond self-attention: external attention using two linear layers for visual tasks. IEEE Trans. Pattern Anal. Mach. Intell. 45(5), 5436–5447 (2022)
Google Scholar
Liu, X., Hu, Y., Chen, J.: Hybrid CNN-transformer model for medical image segmentation with pyramid convolution and multi-layer perceptron. Biomed. Signal Process. Control 86, 105331 (2023)
Article Google Scholar
Yu, Z., Lee, F., Chen, Q.: Hct-net: hybrid CNN-transformer model based on a neural architecture search network for medical image segmentation. Appl. Intell. 53, 19990–20006 (2023)
Article Google Scholar
Wang, T., Lan, J., Han, Z., Hu, Z., Huang, Y., Deng, Y., Zhang, H., Wang, J., Chen, M., Jiang, H., et al.: O-net: a novel framework with deep fusion of CNN and transformer for simultaneous segmentation and classification. Front. Neurosci. 16, 876065 (2022)
Article PubMed PubMed Central Google Scholar
Chen, Y., Wang, T., Tang, H., Zhao, L., Zhang, X., Tan, T., Gao, Q., Du, M., Tong, T.: Cotrfuse: a novel framework by fusing CNN and transformer for medical image segmentation. Phys. Med. Biol. 68(17), 175027 (2023)
Article Google Scholar
He, Q., Yang, Q., Xie, M.: Hctnet: A hybrid CNN-transformer network for breast ultrasound image segmentation. Comput. Biol. Med. 155, 106629 (2023)
Article PubMed Google Scholar
Heidari, M., Kazerouni, A., Soltany, M., Azad, R., Aghdam, E.K., Cohen-Adad, J., Merhof, D.: Hiformer: hierarchical multi-scale representations using transformers for medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 6202–6212 (2023)
Zhao, M., Zhong, S., Fu, X., Tang, B., Pecht, M.: Deep residual shrinkage networks for fault diagnosis. IEEE Trans. Ind. Inform. 16(7), 4681–4690 (2019)
Article Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp. 3–19 (2018)
Oktay, O., Schlemper, J., Folgoc, L.L., Lee, M., Heinrich, M., Misawa, K., Mori, K., McDonagh, S., Hammerla, N.Y., Kainz, B., et al.: Attention u-net: learning where to look for the pancreas. arXiv:1804.03999 (2018)
Fu, S., Lu, Y., Wang, Y., Zhou, Y., Shen, W., Fishman, E., Yuille, A.: Domain adaptive relational reasoning for 3d multi-organ segmentation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part I 23, pp. 656–666, Springer (2020)
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 801–818 (2018)
Huang, X., Deng, Z., Li, D., Yuan, X.: Missformer: an effective medical image segmentation transformer. arXiv:2109.07162 (2021)
Naderi, M., Givkashi, M., Piri, F., Karimi, N., Samavi, N.: Focal-unet: Unet-like focal modulation for medical image segmentation. arXiv:2212.09263 (2022)
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Al-Dhabyani, W., Gomaa, M., Khaled, H., Fahmy, A.: Dataset of breast ultrasound images. Data Brief 28, 104863 (2020)
Article PubMed Google Scholar
Valanarasu, J.M.J., Patel, V.M.: Unext: Mlp-based rapid medical image segmentation network. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 23–33, Springer (2022)
Zhang, Z., Liu, Q., Wang, Y.: Road extraction by deep residual u-net. IEEE Geosci. Remote Sens. Lett. 15(5), 749–753 (2018)

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Manufacturing Electronics Research and Development Center, Institute of Microelectronics of the Chinese Academy of Sciences, No.3 Beitucheng West Road, Beijing, 100029, China
Yuzhao Liu, Liming Han, Bin Yao & Qing Li
School of Integrated Circuits, University of Chinese Academy of Sciences, Zhongguancun South Road, Beijing, 100020, China
Yuzhao Liu, Liming Han & Qing Li

Authors

Yuzhao Liu
View author publications
You can also search for this author inPubMed Google Scholar
Liming Han
View author publications
You can also search for this author inPubMed Google Scholar
Bin Yao
View author publications
You can also search for this author inPubMed Google Scholar
Qing Li
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Yuzhao Liu wrote the main manuscript text. Liming Han, Bin Yao, and Qing Li provide important suggestions for the manuscript. All authors reviewed the manuscript.

Corresponding author

Correspondence to Qing Li.

Ethics declarations

Conflict of interest

Not applicable.

Ethical Approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, Y., Han, L., Yao, B. et al. STA-Former: enhancing medical image segmentation with Shrinkage Triplet Attention in a hybrid CNN-Transformer model. SIViP 18, 1901–1910 (2024). https://doi.org/10.1007/s11760-023-02893-5

Download citation

Received: 16 June 2023
Revised: 27 October 2023
Accepted: 14 November 2023
Published: 13 December 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11760-023-02893-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

STA-Former: enhancing medical image segmentation with Shrinkage Triplet Attention in a hybrid CNN-Transformer model

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

RTNet: a residual t-shaped network for medical image segmentation

msFormer: Adaptive Multi-Modality 3D Transformer for Medical Image Segmentation

Dual-attention transformer-based hybrid network for multi-modal medical image segmentation

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now