Abstract
Accurate polyp detection is essential for assisting clinical rectal cancer diagnoses. Colonoscopy videos contain richer information than still images, making them a valuable resource for deep learning methods. However, unlike common fixed-camera video, the camera-moving scene in colonoscopy videos can cause rapid video jitters, leading to unstable training for existing video detection models. In this paper, we propose the YONA (You Only Need one Adjacent Reference-frame) method, an efficient end-to-end training framework for video polyp detection. YONA fully exploits the information of one previous adjacent frame and conducts polyp detection on the current frame without multi-frame collaborations. Specifically, for the foreground, YONA adaptively aligns the current frame’s channel activation patterns with its adjacent reference frames according to their foreground similarity. For the background, YONA conducts background dynamic alignment guided by inter-frame difference to eliminate the invalid features produced by drastic spatial jitters. Moreover, YONA applies cross-frame contrastive learning during training, leveraging the ground truth bounding box to improve the model’s perception of polyp and background. Quantitative and qualitative experiments on three public challenging benchmarks demonstrate that our proposed YONA outperforms previous state-of-the-art competitors by a large margin in both accuracy and speed.
Y. Jiang and Z. Zhang—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
averaged intersection-over-union scores of target in the nearby frames (\(\pm 10\) frames).
References
Bernal, J.J., et al.: Polyp detection benchmark in colonoscopy videos using GTCreator: a novel fully configurable tool for easy and fast annotation of image databases. In: Proceedings of 32nd CARS Conference (2018)
Chen, Y., Cao, Y., Hu, H., Wang, L.: Memory enhanced global-local aggregation for video object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10337–10346 (2020)
Dai, J., et al.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017)
Fu, J., et al.: Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3146–3154 (2019)
González-Bueno Puyal, J., et al.: Polyp detection on video colonoscopy using a hybrid 2d/3d CNN. Med. Image Anal. 82, 102625 (2022)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Itoh, H., Misawa, M., Mori, Y., Oda, M., Kudo, S.E., Mori, K.: Sun colonoscopy video database (2020). https://amed8k.sundatabase.org/
Luo, C., Zhan, J., Xue, X., Wang, L., Ren, R., Yang, Q.: Cosine normalization: using cosine similarity instead of dot product in neural networks. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds.) ICANN 2018. LNCS, vol. 11139, pp. 382–391. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01418-6_38
Ma, Y., Chen, X., Cheng, K., Li, Y., Sun, B.: LDPolypVideo benchmark: a large-scale colonoscopy video dataset of diverse polyps. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12905, pp. 387–396. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87240-3_37
Misawa, M., et al.: Development of a computer-aided detection system for colonoscopy and a publicly accessible large colonoscopy video database (with video). Gastrointest. Endosc. 93(4), 960–967 (2021)
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. Neural Inf. Process. Syst. (2019)
Qadir, H.A., Balasingham, I., Solhusvik, J., Bergsland, J., Aabakken, L., Shin, Y.: Improving automatic polyp detection using CNN by exploiting temporal dependency in colonoscopy video. IEEE J. Biomed. Health Inf. 24(1), 180–193 (2019)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)
Sun, P., et al.: Sparse R-CNN: end-to-end object detection with learnable proposals. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14454–14463 (2021)
Tajbakhsh, N., Gurudu, S.R., Liang, J.: Automatic polyp detection in colonoscopy videos using an ensemble of convolutional neural networks. In: 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI), pp. 79–83. IEEE (2015)
Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019)
Wang, W., Zhou, T., Yu, F., Dai, J., Konukoglu, E., Van Gool, L.: Exploring cross-image pixel contrast for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7303–7313 (2021)
Wu, L., Hu, Z., Ji, Y., Luo, P., Zhang, S.: Multi-frame collaboration for effective endoscopic video polyp detection via spatial-temporal feature transformation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12905, pp. 302–312. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87240-3_29
Zhan, C., Duan, X., Xu, S., Song, Z., Luo, M.: An improved moving object detection algorithm based on frame difference and edge detection. In: Fourth International Conference on Image and Graphics (ICIG 2007), pp. 519–523 (2007)
Zhang, H., et al.: DINO: DETR with improved denoising anchor boxes for end-to-end object detection. In: The Eleventh International Conference on Learning Representations (2022)
Zhang, Z., et al.: Asynchronous in parallel detection and tracking (AIPDT): real-time robust polyp detection. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12263, pp. 722–731. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59716-0_69
Zheng, H., Chen, H., Huang, J., Li, X., Han, X., Yao, J.: Polyp tracking in video colonoscopy using optical flow with an on-the-fly trained CNN. In: 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), pp. 79–82. IEEE (2019)
Zhou, Q., et al.: Transvod: end-to-end video object detection with spatial-temporal transformers. IEEE Trans. Pattern Anal. Mach. Intell. (2022)
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)
Zhu, X., Wang, Y., Dai, J., Yuan, L., Wei, Y.: Flow-guided feature aggregation for video object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 408–417 (2017)
Acknowledgements
This work was supported in part by Shenzhen General Program No. JCYJ20220530143600001, by the Basic Research Project No. HZQB-KCZYZ-2021067 of Hetao Shenzhen HK S &T Cooperation Zone, by Shenzhen-Hong Kong Joint Funding No. SGDX20211123112401002, NSFC with Grant No. 62293482, by Shenzhen Outstanding Talents Training Fund, by Guangdong Research Project No. 2017ZT07X152 and No. 2019CX01X104, by the Guangdong Provincial Key Laboratory of Future Networks of Intelligence (Grant No. 2022B1212010001), by the Guangdong Provincial Key Laboratory of Big Data Computing, The Chinese University of Hong Kong, Shenzhen, by the NSFC 61931024 &81922046, by the Shenzhen Key Laboratory of Big Data and Artificial Intelligence (Grant No. ZDSYS201707251409055), and the Key Area R &D Program of Guangdong Province with grant No. 2018B030338001, by zelixir biotechnology company Fund, by Tencent Open Fund.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary material 1 (mp4 634 KB)
Supplementary material 2 (mp4 3353 KB)
Supplementary material 3 (mp4 26319 KB)
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Jiang, Y., Zhang, Z., Zhang, R., Li, G., Cui, S., Li, Z. (2023). YONA: You Only Need One Adjacent Reference-Frame for Accurate and Fast Video Polyp Detection. In: Greenspan, H., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol 14224. Springer, Cham. https://doi.org/10.1007/978-3-031-43904-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-43904-9_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43903-2
Online ISBN: 978-3-031-43904-9
eBook Packages: Computer ScienceComputer Science (R0)