Abstract
Existing video polyp segmentation(VPS) models typically employ convolutional neural networks (CNNs) to extract features. However, due to their limited receptive fields, CNNs cannot fully exploit the global temporal and spatial information in successive video frames, resulting in false positive segmentation results. In this paper, we propose the novel PNS-Net (Progressively Normalized Self-attention Network), which can efficiently learn representations from polyp videos with real-time speed (\(\sim \)140fps) on a single RTX 2080 GPU and no post-processing. Our PNS-Net is based solely on a basic normalized self-attention block, equipping with recurrence and CNNs entirely. Experiments on challenging VPS datasets demonstrate that the proposed PNS-Net achieves state-of-the-art performance. We also conduct extensive experiments to study the effectiveness of the channel split, soft-attention, and progressive learning strategy. We find that our PNS-Net works well under different settings, making it a promising solution to the VPS task.
G.-P. Ji and Y.-C. Chou—Contributed equally. Code: http://dpfan.net/pnsnet/.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
We set \(H^{l}=\frac{H'}{4}\), \(W^{l}=\frac{W'}{4}\), \(C^{l}=24\), \(H^{h}=\frac{H'}{8}\), \(W^{h}=\frac{W'}{8}\), and \(C^{h}=32\).
References
Akbari, M., et al.: Polyp segmentation in colonoscopy images using fully convolutional network. In: IEEE EMBC, pp. 69–72 (2018)
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)
Bernal, J., Sánchez, F.J., Fernández-Esparrach, G., Gil, D., Rodríguez, C., Vilariño, F.: Wm-dova maps for accurate polyp highlighting in colonoscopy: validation vs. saliency maps from physicians. CMIG 43, 99–111 (2015)
Bernal, J., Sánchez, J., Vilarino, F.: Towards automatic polyp detection with a polyp appearance model. PR 45(9), 3166–3182 (2012)
Brandao, P., et al.: Fully convolutional neural networks for polyp segmentation in colonoscopy. In: MICAD, vol. 10134, p. 101340F (2017)
Fan, D.P., Cheng, M.M., Liu, Y., Li, T., Borji, A.: Structure-measure: a new way to evaluate foreground maps. In: IEEE ICCV, pp. 4548–4557 (2017)
Fan, D.P., Ji, G.P., Cheng, M.M., Shao, L.: Concealed object detection. IEEE TPAMI 66, 9909–9917 (2021)
Fan, D.P., Ji, G.P., Qin, X., Cheng, M.M.: Cognitive vision inspired object segmentation metric and loss function. SSI (2020)
Fan, D.P., et al.: Pranet: parallel reverse attention network for polyp segmentation. In: MICCAI, pp. 263–273 (2020)
Fang, Y., Chen, C., Yuan, Y., Tong, K.: Selective feature aggregation network with area-boundary constraints for polyp segmentation. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11764, pp. 302–310. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32239-7_34
Gu, Y., Wang, L., Wang, Z., Liu, Y., Cheng, M.M., Lu, S.P.: Pyramid constrained self-attention network for fast video salient object detection. AAAI 34, 10869–10876 (2020)
Guo, L., Liu, J., Zhu, X., Yao, P., Lu, S., Lu, H.: Normalized and geometry-aware self-attention network for image captioning. In: IEEE CVPR, pp. 10327–10336 (2020)
Jha, D., et al.: Kvasir-SEG: a segmented polyp dataset. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 451–462. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_37
Jha, D., et al.: Resunet++: an advanced architecture for medical image segmentation. In: IEEE ISM, pp. 225–2255 (2019)
Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with gaussian edge potentials. NIPS 24, 109–117 (2011)
Liu, S., Huang, D., et al.: Receptive field block net for accurate and fast object detection. In: ECCV, pp. 385–400 (2018)
Mamonov, A.V., Figueiredo, I.N., Figueiredo, P.N., Tsai, Y.H.R.: Automated polyp detection in colon capsule endoscopy. IEEE TMI 33(7), 1488–1502 (2014)
Murugesan, B., Sarveswaran, K., Shankaranarayana, S.M., Ram, K., Joseph, J., Sivaprakasam, M.: Psi-Net: shape and boundary aware joint multi-task deep network for medical image segmentation. In: IEEE EMBC, pp. 7223–7226 (2019)
Puyal, J.G.B., et al.: Endoscopic polyp segmentation using a hybrid 2D/3D CNN. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12266, pp. 295–305. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59725-2_29
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Tajbakhsh, N., Gurudu, S.R., Liang, J.: Automated polyp detection in colonoscopy videos using shape and context information. IEEE TMI 35(2), 630–644 (2015)
Vaswani, A., et al.: Attention is all you need. In: NIPS (2017)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: IEEE CVPR, pp. 7794–7803 (2018)
Yang, F., Yang, H., Fu, J., Lu, H., Guo, B.: Learning texture transformer network for image super-resolution. In: IEEE CVPR, pp. 5791–5800 (2020)
Yu, L., Chen, H., Dou, Q., Qin, J., Heng, P.A.: Integrating online and offline three-dimensional deep learning for automated polyp detection in colonoscopy videos. IEEE JBHI 21(1), 65–75 (2016)
Zhang, R., Li, G., Li, Z., Cui, S., Qian, D., Yu, Y.: Adaptive context selection for polyp segmentation. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12266, pp. 253–262. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59725-2_25
Zhang, R., Zheng, Y., Poon, C.C., Shen, D., Lau, J.Y.: Polyp detection during colonoscopy using a regression-based convolutional neural network with a tracker. PR 83, 209–219 (2018)
Zhao, X., Zhang, L., Lu, H.: Automatic polyp segmentation via multi-scale subtraction network. In: MICCAI (2021)
Zhong, J., Wang, W., Wu, H., Wen, Z., Qin, J.: PolypSeg: an efficient context-aware network for polyp segmentation from colonoscopy videos. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12266, pp. 285–294. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59725-2_28
Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: UNet++: A nested U-Net architecture for medical image segmentation. In: Stoyanov, D., et al. (eds.) DLMIA/ML-CDS -2018. LNCS, vol. 11045, pp. 3–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00889-5_1
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Ji, GP. et al. (2021). Progressively Normalized Self-Attention Network for Video Polyp Segmentation. In: de Bruijne, M., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2021. MICCAI 2021. Lecture Notes in Computer Science(), vol 12901. Springer, Cham. https://doi.org/10.1007/978-3-030-87193-2_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-87193-2_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87192-5
Online ISBN: 978-3-030-87193-2
eBook Packages: Computer ScienceComputer Science (R0)