Skip to main content
Log in

A pixel and channel enhanced up-sampling module for biomedical image segmentation

  • Research
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Up-sampling operations are frequently utilized to recover the spatial resolution of feature maps in neural networks for segmentation task. However, current up-sampling methods, such as bilinear interpolation or deconvolution, do not fully consider the relationship of feature maps, which have negative impact on learning discriminative features for semantic segmentation. In this paper, we propose a pixel and channel enhanced up-sampling (PCE) module for low-resolution feature maps, aiming to use the relationship of adjacent pixels and channels for learning discriminative high-resolution feature maps. Specifically, the proposed up-sampling module includes two main operations: (1) increasing spatial resolution of feature maps with pixel shuffle and (2) recalibrating channel-wise high-resolution feature response. Our proposed up-sampling module could be integrated into CNN and Transformer segmentation architectures. Extensive experiments on three different modality datasets of biomedical images, including computed tomography (CT), magnetic resonance imaging (MRI) and micro-optical sectioning tomography images (MOST) demonstrate the proposed method could effectively improve the performance of representative segmentation models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data Availability

The authors confirm that the Synapse and MSD data supporting the findings of this study are available within the article. But the MOST data are not publicly available.

Code Availability

Not applicable.

Notes

  1. https://www.synapse.org/#!Synapse:syn3193805/wiki/217789.

  2. http://medicaldecathlon.com/.

References

  1. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

  2. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440 (2015)

  3. Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018). https://doi.org/10.1109/TPAMI.2017.2699184

    Article  Google Scholar 

  4. Chen, L.-C., Papandreou, G., Schroff, F., Adam, H.: Rethinking Atrous Convolution for Semantic Image Segmentation (2017). https://doi.org/10.48550/arXiv.1706.05587

  5. Jarujareet, U., Wiratchawa, K., Panpisut, P., Intharah, T.: Deepddm: A compact deep-learning assisted platform for micro-rheological assessment of micro-volume fluids. IEEE Access 11, 66467–66477 (2023). https://doi.org/10.1109/ACCESS.2023.3290496

    Article  Google Scholar 

  6. Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., Torr, P.H.S., Zhang, L.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6881–6890 (2021)

  7. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10012–10022 (2021)

  8. Lin, A., Chen, B., Xu, J., Zhang, Z., Lu, G., Zhang, D.: Ds-transunet: Dual swin transformer u-net for medical image segmentation. IEEE Trans. Instrum. Meas. 71, 1–15 (2022). https://doi.org/10.1109/TIM.2022.3178991

    Article  Google Scholar 

  9. Poudel, R.P.K., Liwicki, S., Cipolla, R.: Fast-SCNN: Fast Semantic Segmentation Network (2019). https://doi.org/10.48550/arXiv.1902.04502

  10. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

  11. Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation (2016). https://doi.org/10.48550/arXiv.1606.02147

  12. Zhang, X., Chen, Z., Wu, Q.M.J., Cai, L., Lu, D., Li, X.: Fast semantic segmentation for scene perception. IEEE Trans. Industr. Inf. 15(2), 1183–1192 (2019). https://doi.org/10.1109/TII.2018.2849348

    Article  Google Scholar 

  13. Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1520–1528 (2015)

  14. Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017). https://doi.org/10.1109/TPAMI.2016.2644615

    Article  Google Scholar 

  15. Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-unet: Unet-like pure transformer for medical image segmentation. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds.) Computer Vision – ECCV 2022 Workshops, pp. 205–218. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-25066-8_9

  16. Chen, M.-J., Huang, C.-H., Lee, W.-L.: A fast edge-oriented algorithm for image interpolation. Image Vis. Comput. 23(9), 791–798 (2005). https://doi.org/10.1016/j.imavis.2005.05.005

    Article  Google Scholar 

  17. Asuni, N., Giachetti, A.: Accuracy improvements and artifacts removal in edge based image interpolation. In: Proceedings of the Third International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2008), vol. 2, pp. 58–65. SciTePress, Funchal (2008). https://doi.org/10.5220/0001074100580065

  18. Seo, H., Huang, C., Bassenne, M., Xiao, R., Xing, L.: Modified u-net (mu-net) with incorporation of object-dependent high level features for improved liver and liver-tumor segmentation in ct images. IEEE Trans. Med. Imaging 39(5), 1316–1325 (2020). https://doi.org/10.1109/TMI.2019.2948320

  19. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141 (2018)

  20. Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: Eca-net: Efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11534–11542 (2020)

  21. Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010)

  22. Li, A., Gong, H., Zhang, B., Wang, Q., Yan, C., Wu, J., Liu, Q., Zeng, S., Luo, Q.: Micro-optical sectioning tomography to obtain a high-resolution atlas of the mouse brain. Science 330(6009), 1404–1408 (2010). https://doi.org/10.1126/science.1191776

    Article  Google Scholar 

  23. Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 801–818 (2018)

  24. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2881–2890 (2017)

  25. Chaurasia, A., Culurciello, E.: Linknet: Exploiting encoder representations for efficient semantic segmentation. In: 2017 IEEE Visual Communications and Image Processing (VCIP), pp. 1–4 (2017). https://doi.org/10.1109/VCIP.2017.8305148

  26. Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., Lu, L., Yuille, A.L., Zhou, Y.: TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation (2021). https://doi.org/10.48550/arXiv.2102.04306

  27. Xu, G., Wu, X., Zhang, X., He, X.: LeViT-UNet: Make Faster Encoders with Transformer for Medical Image Segmentation (2021). https://doi.org/10.48550/arXiv.2107.08623

  28. Wang, Y., Zhou, Q., Xiong, J., Wu, X., Jin, X.: Esnet: An efficient symmetric network for real-time semantic segmentation. In: Lin, Z., Wang, L., Yang, J., Shi, G., Tan, T., Zheng, N., Chen, X., Zhang, Y. (eds.) Pattern Recognition and Computer Vision, pp. 41–52. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31723-2_4

  29. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

Download references

Acknowledgements

This work is supported by the Guangdong Provincial Key Laboratory of Human Digital Twin (No. 2022B1212010004), the Open-Fund of WNLO (Grant No. 2018WNLOKF027) and the Graduate Innovative Fund of Wuhan Institute of Technology (No. CX2022349). We thank the Optical Bioimaging Core Facility of WNLO-HUST for the support in MOST data acquisition.

Funding

Funding for this study was received from the Guangdong Provincial Key Laboratory of Human Digital Twin (No. 2022B1212010004), the Fundamental Research Funds for the Central Universities of China (Grant No. PA2023IISL0095) and the Graduate Innovative Fund of Wuhan Institute of Technology (No.CX2022349).

Author information

Authors and Affiliations

Authors

Contributions

Not applicable.

Corresponding author

Correspondence to Guoping Xu.

Ethics declarations

Conflict of interest

We confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Table 4 Ablation experiments on different components of the PCE module on the MOST dataset (where PS stands for the addition of only the pixel shuffle operation and GRE stands for the addition of the global relationship-enhanced operation after bilinear up-sampling)
Table 5 Ablation experiments on different components of the PCE module on the MOST dataset (where SE and ECA are the result of adding other attention mechanisms on top of PS)

The details how to plug into the PCE into other architectures are presented in the appendix part. Specifically, the PCE module proposed in this paper can be seamlessly integrated into other segmentation architectures, as illustrated in Figs. 5 and 6. The PCE module can directly replace the up-sampling module (indicated by the red block and red arrows in the figures) in U-Net and Fast-SCNN models, improving segmentation accuracy compared to traditional up-sampling modules.

The two different phases in the PCE module provide different functions, and in order to determine the contribution of the key components in the PCE to its success, ablation experiments were performed on both aspects as shown in Table 4. We found that it could increase 0.82% Dice by integrating with the global relationship in our up-sampling module. Moreover, the experimental results demonstrate the PS (Pixel Shuffle) in our PCE module is more critical for segmentation task.

The experiments demonstrate the effectiveness of the global relationship enhancement feature. In addition, we conducted ablation experiments on the attention mechanism in the PCE module, as shown in Table 5. After replacing the attention mechanism in the PCE module with SE or ECA in turn, the segmentation results are all decreased. The segmentation result of the attention mechanism proposed in this paper is 87.79%. Compared with other attention mechanisms, the method of enhancing global relations proposed in this paper is more advantageous.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, X., Xu, G., Wu, X. et al. A pixel and channel enhanced up-sampling module for biomedical image segmentation. Machine Vision and Applications 35, 30 (2024). https://doi.org/10.1007/s00138-024-01513-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-024-01513-7

Keywords

Navigation