Skip to main content
Log in

Polyp segmentation with convolutional MLP

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Accurate polyp segmentation can help doctors find and resect abnormal tissue and decrease the chances of polyps changing into colorectal cancer. The current polyp segmentation neural networks are still challenged by complicated scenarios where polyps have large variations of shapes, size, color, and appearance. In this paper, we propose convolutional multilayer perceptron polyp segmentation network to achieve more accurate polyp segmentation in colonoscopy images. The proposed network adopts a convolutional MLP encoder and enhances the low-level feature using the parallel self-attention module. Furthermore, instead of directly adding encoder features to the decoder, we introduce a cascaded context aggregation module to aggregate the high-level semantic feature and low-level local feature. Finally, channel guide group reverse attention is used to enhance structural and textural details by mining the relationship between areas and boundary cues. The proposed approach is evaluated on six widely adopted datasets and demonstrates superior performance compared to other state-of-the-art models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Nogueira-Rodríguez, A., Domínguez-Carbajales, R., Campos-Tato, F., et al.: Real-time polyp detection model using convolutional neural networks. Neural Comput. Appl. (2021). https://doi.org/10.1007/s00521-021-06496-4

    Article  Google Scholar 

  2. Wickstrøm, K., Kampffmeyer, M., Jenssen, R.: Uncertainty and interpretability in convolutional neural networks for semantic segmentation of colorectal polyps. Med. Image Anal. 60, 101619 (2020). https://doi.org/10.1016/j.media.2019.101619

    Article  Google Scholar 

  3. Sundaram, P., Zomorodian, A., Beaulieu, C., Napel, S.: Automated polyp detection in colon capsule endoscopy. IEEE Trans. Med. Imaging 33(7), 1488–1502 (2014). https://doi.org/10.1109/TMI.2014.2314959

    Article  Google Scholar 

  4. Tajbakhsh, N., Gurudu, S.R., Liang, J.: Colon polyp detection using smoothed shape operators: preliminary results. Med. Image Anal. 12(2), 99–119 (2008). https://doi.org/10.1016/j.media.2007.08.001

    Article  Google Scholar 

  5. Brandao, P., Zisimopoulos, O., Mazomenos, E., Ciuti, G., Bernal, J., Visentini-Scarzanella, M., et al.: Towards a computed-aided diagnosis system in colonoscopy: automatic polyp segmentation using convolution neural networks. J. Med. Robot. Res. 3(02), 1840002 (2018). https://doi.org/10.1142/s2424905x18400020

    Article  Google Scholar 

  6. Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., Liang, J.: Unet++: A nested u-net architecture for medical image segmentation. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 3–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00889-5_1

    Chapter  Google Scholar 

  7. Murugesan, B., Sarveswaran, K., Shankaranarayana, S.M., Ram, K., Joseph J., Sivaprakasam, M.: Psi-Net: shape and boundary aware joint multi-task deep network for medical image segmentation. In: 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 7223–7226. IEEE (2019). https://doi.org/10.1109/EMBC.2019.8857339

  8. Jha, D., Smedsrud, P.H., Riegler, M.A., Johansen, D., De, Lange, T., Halvorsen, P., Johansen, H.D.: Resunet++: an advanced architecture for medical image segmentation. In: 2019 IEEE International Symposium on Multimedia, pp. 225–2255. IEEE (2019). https://doi.org/10.1109/ISM46123.2019.00049

  9. Zhong, J., Wang, W., Wu, H., Wen, Z., Qin, J.: PolypSeg: an efficient context-aware network for polyp segmentation from colonoscopy videos. In: Medical Image Computing and Computer Assisted Intervention Lecture Notes in Computer Science, vol. 12266. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59725-2_28

    Chapter  Google Scholar 

  10. Ji, G.P., Chou, Y.C., Fan, D.P., Chen, G., Fu, H., Jha, D., Shao, L.: Progressively normalized self-attention network for video polyp segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 142–152. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_14

  11. Wu, H., Zhong, J., Wang, W., Wen, Z., Qin, J.: Precise yet efficient semantic calibration and refinement in convnets for real-time polyp segmentation from colonoscopy videos. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, No. 4, pp. 2916–2924 (2021)

  12. Vázquez, D., Bernal, J., Sánchez, F.J., Fernández-Esparrach, G., López, A.M., Romero, A., Courville, A.: A benchmark for endoluminal scene segmentation of colonoscopy images. J. Healthc. Eng. (2017). https://doi.org/10.1155/2017/4037190

    Article  Google Scholar 

  13. Bernal, J., Sánchez, F.J., Fernández-Esparrach, G., Gil, D., Rodríguez, C., Vilariño, F.: WM-DOVA maps for accurate polyp highlighting in colonoscopy: validation vs. saliency maps from physicians. Comput. Med. Imaging Graph. 43, 99–111 (2015). https://doi.org/10.1016/j.compmedimag.2015.02.007

    Article  Google Scholar 

  14. Bernal, J., Sánchez, J., Vilarino, F.: Towards automatic polyp detection with a polyp appearance model. Pattern Recognit. 45(9), 3166–3182 (2012). https://doi.org/10.1016/j.patcog.2012.03.002

    Article  Google Scholar 

  15. Jha, D., Smedsrud, P.H., Riegler, M.A., Halvorsen, P., de Lange, T., Johansen, D., Johansen, H.D.: Kvasir-seg: a segmented polyp dataset. In: International conference on multimedia modeling, pp. 451–462. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_37

  16. Silva, J., Histace, A., Romain, O., Dray, X., Granado, B.: Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer. Int. J. Comput. Assist. Radiol. Surg. 9(2), 283–293 (2014). https://doi.org/10.1007/s11548-013-0926-3

    Article  Google Scholar 

  17. Sánchez-Peralta, L.F., Pagador, J.B., Picón, A., Calderón, Á.J., Polo, F., Andraka, N., Sánchez-Margallo, F.M.: PICCOLO white-light and narrow-band imaging colonoscopic dataset: a performance comparative of models and datasets. Appl. Sci. 10(23), 8501 (2020). https://doi.org/10.3390/app10238501

    Article  Google Scholar 

  18. Zhou SK, Greenspan H, Davatzikos C, Duncan JS, Van Ginneken B, Madabhushi A, Summers RM (2021) A review of deep learning in medical imaging: Imaging traits, technology trends, case studies with progress highlights, and future promises. In: Proceedings of the IEEE, vol. 109, no. 5, pp. 820–838 (2021). https://doi.org/10.1109/JPROC.2021.3054390

  19. Akbari, M., Mohrekesh, M., Nasr-Esfahani, E., Soroushmehr, S. R., Karimi, N., Samavi, S., Najarian, K.: Polyp segmentation in colonoscopy images using fully convolutional network. In: 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 69–72. IEEE (2018). https://doi.org/10.1109/EMBC.2018.8512197

  20. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

  21. Fang, Y., Chen, C., Yuan, Y., Tong, K.Y.: Selective feature aggregation network with area-boundary constraints for polyp segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 302–310. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32239-7_34

  22. Fan, D.P., Ji, G.P., Zhou, T., Chen, G., Fu, H., Shen, J., Shao, L.: Pranet: parallel reverse attention network for polyp segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 263–273. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59725-2_26

  23. Ahmed, A.: Generative adversarial networks for automatic polyp segmentation. MediaEval20, Multimedia Evaluation Workshop (2020). https://doi.org/10.1109/EMBC.2019.8857958

  24. Patel, K., Bur, A.M., Wang, G.: Enhanced U-Net: a feature enhancement network for polyp segmentation. In: Proceedings of the International Robots & Vision Conference. International Robots & Vision Conference, 2021, pp 181–188 (2021). https://doi.org/10.1109/crv52889.2021.00032

  25. Zhang, R., Li, G., Li, Z., Cui, S., Qian, D., Yu, Y.: Adaptive context selection for polyp segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 253–262. Springer (2020). https://doi.org/10.1007/978-3-030-59725-2_25

  26. Wei, J., Hu, Y., Zhang, R., Li, Z., Zhou, S.K., Cui, S.: Shallow attention network for polyp segmentation. In: International conference on medical image computing and computer-assisted intervention, pp. 699–708. Springer (2021). https://doi.org/10.1007/978-3-030-87193-2_66

  27. Lai, H., Luo, Y., Zhang, G., Shen, X., Li, B., Lu, J.: Toward accurate polyp segmentation with cascade boundary-guided attention. Vis. Comput. (2022). https://doi.org/10.1007/s00371-022-02422-4

    Article  Google Scholar 

  28. Tolstikhin, I., Houlsby, N., Kolesnikov, A., Beyer, L., Zhai, X., Unterthiner, T., Dosovitskiy, A.: Mlp-mixer: an all-mlp architecture for vision. In: Thirty-Fifth Conference on Neural Information Processing Systems. arXiv preprint. https://arxiv.org/pdf/2105.01601 (2021)

  29. Ding, X., Zhang, X., Han, J., Ding, G.: RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition. arXiv preprint. https://arxiv.org/abs/2105.01883 (2021)

  30. Liu, H., Dai, Z., So, DR., Le, Q.V.: Pay Attention to MLPs. arXiv preprint. https://arxiv.org/abs/2105.08050 (2021)

  31. Chen, S., Xie, E., Ge, C., Liang, D., Luo, P.: Cyclemlp: a mlp-like architecture for dense prediction. arXiv preprint https://arxiv.org/abs/2107.10224 (2021)

  32. Lian, D., Yu, Z., Sun, X., Gao, S.: As-mlp: an axial shifted mlp architecture for vision. arXiv preprint https://arxiv.org/abs/2107.08391 (2021)

  33. Guo, J., Tang, Y., Han, K., Chen, X., Wu, H., Xu, C., Wang, Y.: Hire-MLP: vision MLP via hierarchical rearrangement. arXiv preprint https://arxiv.org/abs/2108.13341 (2021)

  34. Li, J., Hassani, A., Walton, S., Shi, H.: ConvMLP: hierarchical convolutional MLPs for vision. arXiv preprint https://arxiv.org/pdf/2109.04454 (2021)

  35. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018). https://doi.org/10.1109/TPAMI.2019.2913372

  36. Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision, pp. 3–19 (2018). https://doi.org/10.1007/978-3-030-01234-2_1

  37. Guo, J., Ma, X., Sansom, A., McGuire, M., Kalaani, A., Chen, Q., Fu, S.: Spanet: spatial pyramid attention network for enhanced image recognition. In: 2020 IEEE International Conference on Multimedia and Expo, pp. 1–6. IEEE (2020). https://doi.org/10.1109/ICME46284.2020.9102906

  38. Li, H., Luo, H., Huan, W., et al.: Automatic lumbar spinal MRI image segmentation with a multi-scale attention network. Neural Comput. Appl. 33, 11589–11602 (2021). https://doi.org/10.1007/s00521-021-05856-4

    Article  Google Scholar 

  39. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, AN., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008. https://papers.nips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf (2017)

  40. Cordonnier, J.B., Loukas, A., Jaggi, M.: On the relationship between self-attention and convolutional layers. In: Eighth International Conference on Learning Representations (2020). https://openreview.net/forum?id=HJlnC1rKPB

  41. Jiang, M., Zhai, F., Kong, J.: Sparse Attention Module for optimizing semantic segmentation performance combined with a multi-task feature extraction network. Vis. Comput. (2021). https://doi.org/10.1007/s00371-021-02124-3

    Article  Google Scholar 

  42. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7794–7803 (2018). https://openaccess.thecvf.com/content_cvpr_2018/papers/Wang_Non-Local_Neural_Networks_CVPR_2018_paper.pdf

  43. Nam, H., Ha, J.W., Kim, J.: Dual attention networks for multimodal reasoning and matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 299–307 (2017). https://doi.org/10.1109/CVPR.2017.232

  44. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017). https://doi.org/10.1109/CVPR.2017.634

  45. Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018). https://doi.org/10.1109/CVPR.2018.00716

  46. Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in Neural Information Processing Systems, vol. 30, pp. 3856–3866 (2017). https://papers.nips.cc/paper/2017/file/2cad8fa47bbef282badbb8de5374b894-Paper.pdf

  47. Hou, Q., Jiang, Z., Yuan, L., Cheng, M.M., Yan, S., Feng, J.: Vision permutator: a permutable mlp-like architecture for visual recognition. arXiv preprint https://arxiv.org/abs/2106.12368 (2021)

  48. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778(2016). https://doi.org/10.1109/CVPR.2016.90

  49. Gao, S.H., Cheng, M.M., Zhao, K., Zhang, X.Y., Yang, M.H., Torr, P.: Res2net: A new multi-scale backbone architecture. IEEE Trans. Pattern Anal Mach. Intell. 43(2), 652–662 (2019). https://doi.org/10.1109/TPAMI.2019.2938758

    Article  Google Scholar 

  50. Wu, Z., Su, L., Huang, Q.: Cascaded partial decoder for fast and accurate salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3907–3916 (2019). https://doi.org/10.1109/CVPR.2019.00403

  51. Fan, D.P., Ji, G.P., Sun, G., Cheng M.M., Shen, J., Shao, L.: Camouflaged object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2777–2787 (2020). https://doi.org/10.1109/cvpr42600.2020.00285

  52. Chen, S., Fu, Y.: Progressively guided alternate refinement network for rgb-d salient object detection. In: European Conference on Computer Vision, pp. 520–538. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58598-3_31

  53. Chen, S., Tan, X., Wang, B., Hu, X.: Reverse attention for salient object detection. In: Proceedings of the European Conference on Computer Vision, pp. 234–250 (2018). https://doi.org/10.1007/978-3-030-01240-3_15

  54. Wei, Y., Feng, J., Liang, X., Cheng, M.M., Zhao, Y., Yan, S.: Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1568–1576 (2017). https://doi.org/10.1109/CVPR.2017.687

  55. Qin, X., Zhang, Z., Huang, C., Gao, C., Dehghan, M., Jagersand, M.: Basnet: boundary-aware salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7479–7489 (2019). https://doi.org/10.1109/CVPR.2019.00766

  56. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International conference on learning representations. http://arxiv.org/abs/1711.05101 (2018)

  57. Wei, J., Wang, S., Huang, Q.: F3Net: fusion, feedback and focus for salient object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 34, no. 07, pp. 12321–12328 (2020). https://doi.org/10.1609/aaai.v34i07.6916

  58. Cheng, M.M., Fan, D.P.: Structure-measure: a new way to evaluate foreground maps. Int. J. Comput. Vision 129(9), 2622–2638 (2021). https://doi.org/10.1007/s11263-021-01490-8

    Article  Google Scholar 

  59. Margolin, R., Zelnik-Manor, L., Tal, A.: How to evaluate foreground maps? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2014). https://doi.org/10.1109/CVPR.2014.39

  60. Fan, D.P., Ji, G.P., Qin, X., Cheng, M.M.: Cognitive vision inspired object segmentation metric and loss function. SCIENTIA SINICA Informationis 6, 6 (2021). https://doi.org/10.1360/SSI-2020-0370

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yan Jin.

Ethics declarations

Conflict of interest

There are no conflicts of interest in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jin, Y., Hu, Y., Jiang, Z. et al. Polyp segmentation with convolutional MLP. Vis Comput 39, 4819–4837 (2023). https://doi.org/10.1007/s00371-022-02630-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-022-02630-y

Keywords

Navigation