Skip to main content
Log in

HT-Net: hierarchical context-attention transformer network for medical ct image segmentation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Convolutional neural networks (CNNs) have been a prevailing technique in the field of medical CT image processing. Although encoder-decoder CNNs exploit locality for efficiency, they cannot adequately model remote pixel relationships. Recent works prove it possible to stack self-attention or transformer layers to effectively learn long-range dependencies. Transformers have been extended to computer vision tasks by creating and treating image patches as embeddings. However, transformer-based architectures lack global semantic information interaction and require large-scale dataset for training, making it difficult to effectively train with limited data samples. To address these issues, we propose a hierarchical context-attention transformer network (HT-Net), which integrates the multi-scale, transformer and hierarchical context extraction modules in skip-connections. The multi-scale module captures richer CT semantic information, enabling transformers to better encode feature maps of tokenized image patches from different stages of CNN as input attention sequences.The hierarchical context attention module complements global information and re-weights the pixels to capture semantic context. Extensive experiments on three datasets demonstrate that the proposed HT-Net outperforms state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Liskowski P, Krawiec K (2016) Segmenting retinal blood vessels with deep neural networks. IEEE Trans Med Imaging 35(11):2369–2380

    Article  Google Scholar 

  2. Ben Abdallah M, Azar A, Guedri H, et al. (2018) Noise-estimation-based anisotropic diffusion approach for retinal blood vessel segmentation. Neural Comput Appl 29:159–180

    Article  Google Scholar 

  3. Tong H, Fang Z, Wei Z, et al. (2021) SAT-Net: a side attention network for retinal image segmentation. Appl Intell 51: 5146–5156

    Article  Google Scholar 

  4. Deniz C M, Xiang S, Hallyburton R S, Welbeck A, Babb J S, Honig S, Cho K, Chang G (2018) Segmentation of the proximal femur from mr images using deep convolutional neural networks. Sci Rep 8(1):1–14

    Article  Google Scholar 

  5. Fan DP, Ji GP, Zhou T, Chen G, Fu H, Shen J, Shao L (2020) Pranet: Parallel reverse attention network for polyp segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, pp 263–273

  6. Zhou Z, Siddiquee M M R, Tajbakhsh N, Liang J (2018) UNEt++: A Nested U-Net Architecture for Medical Image Segmentation. In: 4th Deep Learning in Medical Image Analysis, DLMIA, Workshop, Granada, DLMIA 2018, LNCS 11045, pp 3–11

  7. Khened M, Kollerathu V A, Krishnamurthi G (2019) Fully convolutional multi-scale residual densenets for cardiac segmentation and automated cardiac diagnosis using ensemble of classifiers. Med Image Anal 51:21–45

    Article  Google Scholar 

  8. Pitchai R, Madhu Babu C, Supraja P, et al. (2020) Cerebrum tumor segmentation of high resolution magnetic resonance images using 2D-Convolutional network with skull stripping. Neural Process Lett 53:2567–2580

    Article  Google Scholar 

  9. Pereira S, Pinto A, Alves V, Silva C A (2016) Brain tumor segmentation using convolutional neural networks in MRI images. IEEE Trans Med Imaging 35(5):1240–1251

    Article  Google Scholar 

  10. Pitchai R, Supraja P, Victoria A H, et al. (2020) Brain tumor segmentation using deep learning and fuzzy K-Means clustering for magnetic resonance images. Neural Process Lett 53:2519–2532

    Article  Google Scholar 

  11. Zhao X, Ji J, Wang X (2019) Dynamic brain functional parcellation via sliding window and artificial bee colony algorithm. Appl Intell 49:1748–1770

    Article  Google Scholar 

  12. Soliman A, et al. (2017) Accurate lungs segmentation on CT chest images by adaptive Appearance-Guided shape modeling. IEEE Trans Med Imaging 36(1):263–276

    Article  Google Scholar 

  13. Song J, et al. (2016) Lung lesion extraction using a toboggan based growing automatic segmentation approach. IEEE Trans Med Imaging 35(1):337–353

    Article  Google Scholar 

  14. Jiang J, et al. (2019) Multiple resolution residually connected feature streams for automatic lung tumor segmentation from CT images. IEEE Trans Med Imaging 38(1):134–144

    Article  Google Scholar 

  15. Zhao B, Chen X, Li Z, Yu Z, Yao S, Yan L, Wang Y, Liu Z, Liang C, Han C (2020) Triple U-net: Hematoxylin-aware nuclei segmentation with progressive dense feature aggregation. Med Image Anal 65:101786

    Article  Google Scholar 

  16. Wang Y, Ye H, Cao F (2021) A novel multi-discriminator deep network for image segmentation. Appl Intell. https://doi.org/10.1007/s10489-021-02427-x

  17. Li X, Chen H, Qi X, Dou Q, Fu C W, Heng P A (2018) H-denseunet: hybrid densely connected unet for liver and tumor segmentation from ct volumes. IEEE Trans Med Imaging 37(12):2663–2674

    Article  Google Scholar 

  18. Esteva A, Kuprel B, Novoa R A, Ko J, Swetter S M, Blau H M, et al. (2017) Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639):115–118

    Article  Google Scholar 

  19. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp 3431–3440

  20. Ronneberger O, Fischer P, Brox TN (2015) Convolutional networks for biomedical image segmentation. In: Paper presented at international conference on medical image computing and computer-assisted intervention (ICCV). Springer, pp 234– 241

  21. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778

  22. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008

  23. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2021) An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations, ICLR, arXiv:2010.11929

  24. Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, J egou H (2020) Training data-efficient image transformers & distillation through attention. arXiv:2012.12877

  25. Wang H, Zhu Y, Green B, Adam H, Yuille A, Chen LC (2020) Axial-deeplab: Stand-alone axial-attention for panoptic segmentation. In: ECCV, vol 12349. Springer. https://doi.org/10.1007/978-3-030-58548-8_7

  26. Zheng S, Lu J, Zhao H, Zhu X, Luo Z, Wang Y, Fu Y, Feng J, Xiang T, Torr PH et al (2021) Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR

  27. Zhou B, Zhao H, Puig X, Fidler S, Barriuso A, Torralba A (2016) Semantic understanding of scenes through the ade20k dataset. Int J Comput Vis (IJCV) 127(3):302–321

    Article  Google Scholar 

  28. Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille A L (2018) Deeplab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  29. Milletari F, Navab N, Ahmadi SA (2016) V-Net: Fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV). IEEE, pp 565–571

  30. Schlemper J, Oktay O, Schaap M, Heinrich M, Kainz B, Glocker B, et al. (2019) Attention gated networks: Learning to leverage salient regions in medical images. Med Image Anal 53:197–207

    Article  Google Scholar 

  31. Alom M Z, Yakopcic C, Taha T M, Asari V K (2018) Nuclei Segmentation with Recurrent Residual Convolutional Neural Networks based U-Net (R2U-Net). NAECON 2018 - IEEE National Aerospace and Electronics Conference, pp 228–233

  32. Xiao X, Lian S, Luo Z, Li S (2018) Weighted Res-Unet for High-Quality Retina Vessel Segmentation. In: 2018 9th International Conference on Information Technology in Medicine and Education (ITME). IEEE, pp 327–331

  33. Guan S, Khan A A, Sikdar S, Chitnis P V (2020) Fully dense unet for 2-D sparse photoacoustic tomography artifact removal. IEEE J Biomed Health Inf 24(2):568–576

    Article  Google Scholar 

  34. Ibtehaz N, Rahman M S (2020) MultiresUNet: Rethinking the U-Net Architecture for Multimodal Biomedical Image Segmentation. Neural Netw 121:74–87

    Article  Google Scholar 

  35. Szegedy C, et al (2015) Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1–9

  36. He K, Gkioxari G, Dollr P, Girshick R (2017) Mask r-CNN. in IEEE international conference on computer vision (ICCV), Venice, pp 2980–2988

  37. Gu Z, Cheng J, Fu H, Zhou K, Hao H, Zhao Y, et al. (2019) CE-Net: context encoder network for 2D medical image segmentation. IEEE Trans Med Imaging 38(10):2281–2292

    Article  Google Scholar 

  38. Zhang J, Xie Y, Wang Y, Xia Y (2020) Inter-slice Context Residual Learning for 3D Medical Image Segmentation. In: IEEE Transactions on Medical Imaging(Early Access), pp 1–1

  39. Devlin J, Chang M, Lee K, Toutanova K (2019) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT arXiv:2103.05940

  40. Dai Y, Gao Y (2021) TransMed: Transformers Advance Multi-modal Medical Image Classification. Diagnostics. https://doi.org/10.3390/diagnostics11081384

  41. Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, Lu L, Yuille A, Zhou Y (2021) TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv:2102.04306

  42. Valanarasu J M, Oza P, Hacihaliloglu I, Patel V (2021) Medical transformer: Gated Axial-Attention for medical image Segmentation.Medical image computing and computer assisted intervention, MICCAI. arXiv:2102.10662

  43. Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, Wang M (2021) Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation. arXiv:2105.05537

  44. Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation Networks. IEEE Trans Pattern Anal Mach Intell (TPAMI) 42(8):2011–2023. https://doi.org/10.1109/TPAMI.2019.2913372

  45. Oktay O et al (2018) Attention U-Net: Learning Where to Look for the Pancreas. In: 1st Conference on Medical Imaging with Deep Learning (MIDL). arXiv:1804.03999

  46. Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chua T (2017) SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6298–6306

  47. Wang X, Han S, Chen Y, Gao D, Vasconcelos N (2019) Volumetric attention for 3D medical image segmentation and detection. In: Shen D et al (eds) Medical image computing and computer assisted intervention, MICCAI. Springer, Cham, p 11769

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61762014, in part by the Science and Technology Project of Guangxi under Grant 2018GXNSFAA281351, and in part by the Innovation Project of Guangxi Graduate Education under Grant YCSW2021096.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haiying Xia.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, M., Xia, H., Tan, Y. et al. HT-Net: hierarchical context-attention transformer network for medical ct image segmentation. Appl Intell 52, 10692–10705 (2022). https://doi.org/10.1007/s10489-021-03010-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-03010-0

Keywords

Navigation