Abstract
Over the years, medical image segmentation has played a vital role in assisting healthcare professionals in disease treatment. Convolutional neural networks have demonstrated remarkable success in this domain. Among these networks, the encoder-decoder architecture stands out as a classic and effective model for medical image segmentation. However, several challenges remain to be addressed, including segmentation issues arising from indistinct boundaries, difficulties in segmenting images with irregular shapes, and accurate segmentation of lesions with small targets. To address these limitations, we propose Encoder Activation Diffusion and Decoder Transformer Fusion Network (ADTF). Specifically, we propose a novel Lightweight Convolution Modulation (LCM) formed by a gated attention mechanism, using convolution to encode spatial features. LCM replaces the convolutional layer in the encoder-decoder network. Additionally, to enhance the integration of spatial information and dynamically extract more valuable high-order semantic information, we introduce Activation Diffusion Blocks after the encoder (EAD), so that the network can segment a complete medical segmentation image. Furthermore, we utilize a Transformer-based multi-scale feature fusion module on the decoder (MDFT) to achieve global interaction of multi-scale features. To validate our approach, we conduct experiments on multiple medical image segmentation datasets. Experimental results demonstrate that our model outperforms other state-of-the-art (SOTA) methods on commonly used evaluation metrics.
This work was supported by the National Natural Science Foundation of China (NSFC) under grant numbers 62272342, 62020106004, 92048301.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Azad, R., Asadi-Aghbolaghi, M., Fathy, M., Escalera, S.: Bi-directional convlstm u-net with densley connected convolutions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)
Bai, H., Zhang, R., Wang, J., Wan, X.: Weakly supervised object localization via transformer with implicit spatial calibration. In: Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23–27 October 2022, Proceedings, Part IX, pp. 612–628. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-20077-9_36
Bruna, J., Zaremba, W., Szlam, A., LeCun, Y.: Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 (2013)
Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 801–818 (2018)
Chen, S., Niu, J., Deng, C., Zhang, Y., Chen, F., Xu, F.: Ce-net: a coordinate embedding network for mismatching removal. IEEE Access 9, 147634–147648 (2021)
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017)
Dai, D., et al.: MS RED: a novel multi-scale residual encoding and decoding network for skin lesion segmentation. Med. Image Anal. 75, 102293 (2022)
Feng, S., et al.: CPFNET: context pyramid fusion network for medical image segmentation. IEEE Trans. Med. Imaging 39(10), 3008–3018 (2020)
Gao, S., Tsang, I.W.H., Chia, L.T.: Laplacian sparse coding, hypergraph laplacian sparse coding, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 92–104 (2012)
Gu, R., et al.: CA-NET: comprehensive attention convolutional neural networks for explainable medical image segmentation. IEEE Trans. Med. Imaging 40(2), 699–711 (2020)
Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C.: Ghostnet: more features from cheap operations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1580–1589 (2020)
Hou, Q., Lu, C.Z., Cheng, M.M., Feng, J.: Conv2former: a simple transformer-style convnet for visual recognition. arXiv preprint arXiv:2211.11943 (2022)
Ibtehaz, N., Rahman, M.S.: Multiresunet: rethinking the u-net architecture for multimodal biomedical image segmentation. Neural Netw. 121, 74–87 (2020)
Kondor, R.I., Lafferty, J.: Diffusion kernels on graphs and other discrete structures. In: Proceedings of the 19th International Conference on Machine Learning, vol. 2002, pp. 315–322 (2002)
Liu, Z., Li, X., Luo, P., Loy, C.C., Tang, X.: Semantic image segmentation via deep parsing network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1377–1385 (2015)
Liu, Z., Li, X., Luo, P., Loy, C.C., Tang, X.: Deep learning markov random field for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 40(8), 1814–1828 (2017)
Messaoudi, H., Belaid, A., Salem, D.B.: Cross-dimensional transfer learning in medical image segmentation with deep learning. Med. Image Anal. (2023)
Mou, L., et al.: Cs2-net: deep learning segmentation of curvilinear structures in medical imaging. Med. Image Anal. 67, 101874 (2021)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Wang, H., Cao, P., Wang, J., Zaiane, O.R.: Uctransnet: rethinking the skip connections in u-net from a channel-wise perspective with transformer. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 2441–2449 (2022)
Wang, S., Li, L.: Attu-net: attention u-net for brain tumor segmentation. In: International MICCAI Brainlesion Workshop, pp. 302–311. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-09002-8_27
Wang, Y., Wei, Y., Qian, X., Zhu, L., Yang, Y.: Donet: dual objective networks for skin lesion segmentation. arXiv preprint arXiv:2008.08278 (2020)
Yang, M., Yu, K., Zhang, C., Li, Z., Yang, K.: Denseaspp for semantic segmentation in street scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3684–3692 (2018)
Yuan, F., Zhang, Z., Fang, Z.: An effective CNN and transformer complementary network for medical image segmentation. Pattern Recogn. 136, 109228 (2023)
Zhou, B., Wang, S., Xiao, S.: Double recursive sparse self-attention based crowd counting in the cluttered background. In: Pattern Recognition and Computer Vision: 5th Chinese Conference, PRCV 2022, Shenzhen, China, 4–7 November 2022, Proceedings, Part I, pp. 722–734. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-18907-4_56
Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., Liang, J.: Unet++: redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imaging 39(6), 1856–1867 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Li, X., Xu, G., Zhao, M., Shi, F., Wang, H. (2024). Encoder Activation Diffusion and Decoder Transformer Fusion Network for Medical Image Segmentation. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14437. Springer, Singapore. https://doi.org/10.1007/978-981-99-8558-6_16
Download citation
DOI: https://doi.org/10.1007/978-981-99-8558-6_16
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8557-9
Online ISBN: 978-981-99-8558-6
eBook Packages: Computer ScienceComputer Science (R0)