Abstract
Accurate polyp segmentation can help doctors find and resect abnormal tissue and decrease the chances of polyps changing into colorectal cancer. The current polyp segmentation neural networks are still challenged by complicated scenarios where polyps have large variations of shapes, size, color, and appearance. In this paper, we propose convolutional multilayer perceptron polyp segmentation network to achieve more accurate polyp segmentation in colonoscopy images. The proposed network adopts a convolutional MLP encoder and enhances the low-level feature using the parallel self-attention module. Furthermore, instead of directly adding encoder features to the decoder, we introduce a cascaded context aggregation module to aggregate the high-level semantic feature and low-level local feature. Finally, channel guide group reverse attention is used to enhance structural and textural details by mining the relationship between areas and boundary cues. The proposed approach is evaluated on six widely adopted datasets and demonstrates superior performance compared to other state-of-the-art models.
Similar content being viewed by others
References
Nogueira-Rodríguez, A., Domínguez-Carbajales, R., Campos-Tato, F., et al.: Real-time polyp detection model using convolutional neural networks. Neural Comput. Appl. (2021). https://doi.org/10.1007/s00521-021-06496-4
Wickstrøm, K., Kampffmeyer, M., Jenssen, R.: Uncertainty and interpretability in convolutional neural networks for semantic segmentation of colorectal polyps. Med. Image Anal. 60, 101619 (2020). https://doi.org/10.1016/j.media.2019.101619
Sundaram, P., Zomorodian, A., Beaulieu, C., Napel, S.: Automated polyp detection in colon capsule endoscopy. IEEE Trans. Med. Imaging 33(7), 1488–1502 (2014). https://doi.org/10.1109/TMI.2014.2314959
Tajbakhsh, N., Gurudu, S.R., Liang, J.: Colon polyp detection using smoothed shape operators: preliminary results. Med. Image Anal. 12(2), 99–119 (2008). https://doi.org/10.1016/j.media.2007.08.001
Brandao, P., Zisimopoulos, O., Mazomenos, E., Ciuti, G., Bernal, J., Visentini-Scarzanella, M., et al.: Towards a computed-aided diagnosis system in colonoscopy: automatic polyp segmentation using convolution neural networks. J. Med. Robot. Res. 3(02), 1840002 (2018). https://doi.org/10.1142/s2424905x18400020
Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., Liang, J.: Unet++: A nested u-net architecture for medical image segmentation. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 3–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00889-5_1
Murugesan, B., Sarveswaran, K., Shankaranarayana, S.M., Ram, K., Joseph J., Sivaprakasam, M.: Psi-Net: shape and boundary aware joint multi-task deep network for medical image segmentation. In: 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 7223–7226. IEEE (2019). https://doi.org/10.1109/EMBC.2019.8857339
Jha, D., Smedsrud, P.H., Riegler, M.A., Johansen, D., De, Lange, T., Halvorsen, P., Johansen, H.D.: Resunet++: an advanced architecture for medical image segmentation. In: 2019 IEEE International Symposium on Multimedia, pp. 225–2255. IEEE (2019). https://doi.org/10.1109/ISM46123.2019.00049
Zhong, J., Wang, W., Wu, H., Wen, Z., Qin, J.: PolypSeg: an efficient context-aware network for polyp segmentation from colonoscopy videos. In: Medical Image Computing and Computer Assisted Intervention Lecture Notes in Computer Science, vol. 12266. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59725-2_28
Ji, G.P., Chou, Y.C., Fan, D.P., Chen, G., Fu, H., Jha, D., Shao, L.: Progressively normalized self-attention network for video polyp segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 142–152. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_14
Wu, H., Zhong, J., Wang, W., Wen, Z., Qin, J.: Precise yet efficient semantic calibration and refinement in convnets for real-time polyp segmentation from colonoscopy videos. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, No. 4, pp. 2916–2924 (2021)
Vázquez, D., Bernal, J., Sánchez, F.J., Fernández-Esparrach, G., López, A.M., Romero, A., Courville, A.: A benchmark for endoluminal scene segmentation of colonoscopy images. J. Healthc. Eng. (2017). https://doi.org/10.1155/2017/4037190
Bernal, J., Sánchez, F.J., Fernández-Esparrach, G., Gil, D., Rodríguez, C., Vilariño, F.: WM-DOVA maps for accurate polyp highlighting in colonoscopy: validation vs. saliency maps from physicians. Comput. Med. Imaging Graph. 43, 99–111 (2015). https://doi.org/10.1016/j.compmedimag.2015.02.007
Bernal, J., Sánchez, J., Vilarino, F.: Towards automatic polyp detection with a polyp appearance model. Pattern Recognit. 45(9), 3166–3182 (2012). https://doi.org/10.1016/j.patcog.2012.03.002
Jha, D., Smedsrud, P.H., Riegler, M.A., Halvorsen, P., de Lange, T., Johansen, D., Johansen, H.D.: Kvasir-seg: a segmented polyp dataset. In: International conference on multimedia modeling, pp. 451–462. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_37
Silva, J., Histace, A., Romain, O., Dray, X., Granado, B.: Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer. Int. J. Comput. Assist. Radiol. Surg. 9(2), 283–293 (2014). https://doi.org/10.1007/s11548-013-0926-3
Sánchez-Peralta, L.F., Pagador, J.B., Picón, A., Calderón, Á.J., Polo, F., Andraka, N., Sánchez-Margallo, F.M.: PICCOLO white-light and narrow-band imaging colonoscopic dataset: a performance comparative of models and datasets. Appl. Sci. 10(23), 8501 (2020). https://doi.org/10.3390/app10238501
Zhou SK, Greenspan H, Davatzikos C, Duncan JS, Van Ginneken B, Madabhushi A, Summers RM (2021) A review of deep learning in medical imaging: Imaging traits, technology trends, case studies with progress highlights, and future promises. In: Proceedings of the IEEE, vol. 109, no. 5, pp. 820–838 (2021). https://doi.org/10.1109/JPROC.2021.3054390
Akbari, M., Mohrekesh, M., Nasr-Esfahani, E., Soroushmehr, S. R., Karimi, N., Samavi, S., Najarian, K.: Polyp segmentation in colonoscopy images using fully convolutional network. In: 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 69–72. IEEE (2018). https://doi.org/10.1109/EMBC.2018.8512197
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Fang, Y., Chen, C., Yuan, Y., Tong, K.Y.: Selective feature aggregation network with area-boundary constraints for polyp segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 302–310. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32239-7_34
Fan, D.P., Ji, G.P., Zhou, T., Chen, G., Fu, H., Shen, J., Shao, L.: Pranet: parallel reverse attention network for polyp segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 263–273. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59725-2_26
Ahmed, A.: Generative adversarial networks for automatic polyp segmentation. MediaEval20, Multimedia Evaluation Workshop (2020). https://doi.org/10.1109/EMBC.2019.8857958
Patel, K., Bur, A.M., Wang, G.: Enhanced U-Net: a feature enhancement network for polyp segmentation. In: Proceedings of the International Robots & Vision Conference. International Robots & Vision Conference, 2021, pp 181–188 (2021). https://doi.org/10.1109/crv52889.2021.00032
Zhang, R., Li, G., Li, Z., Cui, S., Qian, D., Yu, Y.: Adaptive context selection for polyp segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 253–262. Springer (2020). https://doi.org/10.1007/978-3-030-59725-2_25
Wei, J., Hu, Y., Zhang, R., Li, Z., Zhou, S.K., Cui, S.: Shallow attention network for polyp segmentation. In: International conference on medical image computing and computer-assisted intervention, pp. 699–708. Springer (2021). https://doi.org/10.1007/978-3-030-87193-2_66
Lai, H., Luo, Y., Zhang, G., Shen, X., Li, B., Lu, J.: Toward accurate polyp segmentation with cascade boundary-guided attention. Vis. Comput. (2022). https://doi.org/10.1007/s00371-022-02422-4
Tolstikhin, I., Houlsby, N., Kolesnikov, A., Beyer, L., Zhai, X., Unterthiner, T., Dosovitskiy, A.: Mlp-mixer: an all-mlp architecture for vision. In: Thirty-Fifth Conference on Neural Information Processing Systems. arXiv preprint. https://arxiv.org/pdf/2105.01601 (2021)
Ding, X., Zhang, X., Han, J., Ding, G.: RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition. arXiv preprint. https://arxiv.org/abs/2105.01883 (2021)
Liu, H., Dai, Z., So, DR., Le, Q.V.: Pay Attention to MLPs. arXiv preprint. https://arxiv.org/abs/2105.08050 (2021)
Chen, S., Xie, E., Ge, C., Liang, D., Luo, P.: Cyclemlp: a mlp-like architecture for dense prediction. arXiv preprint https://arxiv.org/abs/2107.10224 (2021)
Lian, D., Yu, Z., Sun, X., Gao, S.: As-mlp: an axial shifted mlp architecture for vision. arXiv preprint https://arxiv.org/abs/2107.08391 (2021)
Guo, J., Tang, Y., Han, K., Chen, X., Wu, H., Xu, C., Wang, Y.: Hire-MLP: vision MLP via hierarchical rearrangement. arXiv preprint https://arxiv.org/abs/2108.13341 (2021)
Li, J., Hassani, A., Walton, S., Shi, H.: ConvMLP: hierarchical convolutional MLPs for vision. arXiv preprint https://arxiv.org/pdf/2109.04454 (2021)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018). https://doi.org/10.1109/TPAMI.2019.2913372
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision, pp. 3–19 (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Guo, J., Ma, X., Sansom, A., McGuire, M., Kalaani, A., Chen, Q., Fu, S.: Spanet: spatial pyramid attention network for enhanced image recognition. In: 2020 IEEE International Conference on Multimedia and Expo, pp. 1–6. IEEE (2020). https://doi.org/10.1109/ICME46284.2020.9102906
Li, H., Luo, H., Huan, W., et al.: Automatic lumbar spinal MRI image segmentation with a multi-scale attention network. Neural Comput. Appl. 33, 11589–11602 (2021). https://doi.org/10.1007/s00521-021-05856-4
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, AN., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008. https://papers.nips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf (2017)
Cordonnier, J.B., Loukas, A., Jaggi, M.: On the relationship between self-attention and convolutional layers. In: Eighth International Conference on Learning Representations (2020). https://openreview.net/forum?id=HJlnC1rKPB
Jiang, M., Zhai, F., Kong, J.: Sparse Attention Module for optimizing semantic segmentation performance combined with a multi-task feature extraction network. Vis. Comput. (2021). https://doi.org/10.1007/s00371-021-02124-3
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7794–7803 (2018). https://openaccess.thecvf.com/content_cvpr_2018/papers/Wang_Non-Local_Neural_Networks_CVPR_2018_paper.pdf
Nam, H., Ha, J.W., Kim, J.: Dual attention networks for multimodal reasoning and matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 299–307 (2017). https://doi.org/10.1109/CVPR.2017.232
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017). https://doi.org/10.1109/CVPR.2017.634
Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018). https://doi.org/10.1109/CVPR.2018.00716
Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in Neural Information Processing Systems, vol. 30, pp. 3856–3866 (2017). https://papers.nips.cc/paper/2017/file/2cad8fa47bbef282badbb8de5374b894-Paper.pdf
Hou, Q., Jiang, Z., Yuan, L., Cheng, M.M., Yan, S., Feng, J.: Vision permutator: a permutable mlp-like architecture for visual recognition. arXiv preprint https://arxiv.org/abs/2106.12368 (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778(2016). https://doi.org/10.1109/CVPR.2016.90
Gao, S.H., Cheng, M.M., Zhao, K., Zhang, X.Y., Yang, M.H., Torr, P.: Res2net: A new multi-scale backbone architecture. IEEE Trans. Pattern Anal Mach. Intell. 43(2), 652–662 (2019). https://doi.org/10.1109/TPAMI.2019.2938758
Wu, Z., Su, L., Huang, Q.: Cascaded partial decoder for fast and accurate salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3907–3916 (2019). https://doi.org/10.1109/CVPR.2019.00403
Fan, D.P., Ji, G.P., Sun, G., Cheng M.M., Shen, J., Shao, L.: Camouflaged object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2777–2787 (2020). https://doi.org/10.1109/cvpr42600.2020.00285
Chen, S., Fu, Y.: Progressively guided alternate refinement network for rgb-d salient object detection. In: European Conference on Computer Vision, pp. 520–538. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58598-3_31
Chen, S., Tan, X., Wang, B., Hu, X.: Reverse attention for salient object detection. In: Proceedings of the European Conference on Computer Vision, pp. 234–250 (2018). https://doi.org/10.1007/978-3-030-01240-3_15
Wei, Y., Feng, J., Liang, X., Cheng, M.M., Zhao, Y., Yan, S.: Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1568–1576 (2017). https://doi.org/10.1109/CVPR.2017.687
Qin, X., Zhang, Z., Huang, C., Gao, C., Dehghan, M., Jagersand, M.: Basnet: boundary-aware salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7479–7489 (2019). https://doi.org/10.1109/CVPR.2019.00766
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International conference on learning representations. http://arxiv.org/abs/1711.05101 (2018)
Wei, J., Wang, S., Huang, Q.: F3Net: fusion, feedback and focus for salient object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 34, no. 07, pp. 12321–12328 (2020). https://doi.org/10.1609/aaai.v34i07.6916
Cheng, M.M., Fan, D.P.: Structure-measure: a new way to evaluate foreground maps. Int. J. Comput. Vision 129(9), 2622–2638 (2021). https://doi.org/10.1007/s11263-021-01490-8
Margolin, R., Zelnik-Manor, L., Tal, A.: How to evaluate foreground maps? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2014). https://doi.org/10.1109/CVPR.2014.39
Fan, D.P., Ji, G.P., Qin, X., Cheng, M.M.: Cognitive vision inspired object segmentation metric and loss function. SCIENTIA SINICA Informationis 6, 6 (2021). https://doi.org/10.1360/SSI-2020-0370
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
There are no conflicts of interest in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jin, Y., Hu, Y., Jiang, Z. et al. Polyp segmentation with convolutional MLP. Vis Comput 39, 4819–4837 (2023). https://doi.org/10.1007/s00371-022-02630-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-022-02630-y