YONA: You Only Need One Adjacent Reference-Frame for Accurate and Fast Video Polyp Detection

Jiang, Yuncheng; Zhang, Zixun; Zhang, Ruimao; Li, Guanbin; Cui, Shuguang; Li, Zhen

doi:10.1007/978-3-031-43904-9_5

Yuncheng Jiang^14,15,17,
Zixun Zhang^14,15,17,
Ruimao Zhang¹⁶,
Guanbin Li¹⁸,
Shuguang Cui^14,15 &
…
Zhen Li^14,15,17

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14224))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

4966 Accesses

Abstract

Accurate polyp detection is essential for assisting clinical rectal cancer diagnoses. Colonoscopy videos contain richer information than still images, making them a valuable resource for deep learning methods. However, unlike common fixed-camera video, the camera-moving scene in colonoscopy videos can cause rapid video jitters, leading to unstable training for existing video detection models. In this paper, we propose the YONA (You Only Need one Adjacent Reference-frame) method, an efficient end-to-end training framework for video polyp detection. YONA fully exploits the information of one previous adjacent frame and conducts polyp detection on the current frame without multi-frame collaborations. Specifically, for the foreground, YONA adaptively aligns the current frame’s channel activation patterns with its adjacent reference frames according to their foreground similarity. For the background, YONA conducts background dynamic alignment guided by inter-frame difference to eliminate the invalid features produced by drastic spatial jitters. Moreover, YONA applies cross-frame contrastive learning during training, leveraging the ground truth bounding box to improve the model’s perception of polyp and background. Quantitative and qualitative experiments on three public challenging benchmarks demonstrate that our proposed YONA outperforms previous state-of-the-art competitors by a large margin in both accuracy and speed.

Y. Jiang and Z. Zhang—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

SALI: Short-Term Alignment and Long-Term Interaction Network for Colonoscopy Video Polyp Segmentation

PolypNextLSTM: a lightweight and fast polyp video segmentation network using ConvNext and ConvLSTM

Article Open access 08 August 2024

LDPolypVideo Benchmark: A Large-Scale Colonoscopy Video Dataset of Diverse Polyps

Notes

1.
averaged intersection-over-union scores of target in the nearby frames ($\pm 10$ frames).

References

Bernal, J.J., et al.: Polyp detection benchmark in colonoscopy videos using GTCreator: a novel fully configurable tool for easy and fast annotation of image databases. In: Proceedings of 32nd CARS Conference (2018)
Google Scholar
Chen, Y., Cao, Y., Hu, H., Wang, L.: Memory enhanced global-local aggregation for video object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10337–10346 (2020)
Google Scholar
Dai, J., et al.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017)
Google Scholar
Fu, J., et al.: Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3146–3154 (2019)
Google Scholar
González-Bueno Puyal, J., et al.: Polyp detection on video colonoscopy using a hybrid 2d/3d CNN. Med. Image Anal. 82, 102625 (2022)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Itoh, H., Misawa, M., Mori, Y., Oda, M., Kudo, S.E., Mori, K.: Sun colonoscopy video database (2020). https://amed8k.sundatabase.org/
Luo, C., Zhan, J., Xue, X., Wang, L., Ren, R., Yang, Q.: Cosine normalization: using cosine similarity instead of dot product in neural networks. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds.) ICANN 2018. LNCS, vol. 11139, pp. 382–391. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01418-6_38
Chapter Google Scholar
Ma, Y., Chen, X., Cheng, K., Li, Y., Sun, B.: LDPolypVideo benchmark: a large-scale colonoscopy video dataset of diverse polyps. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12905, pp. 387–396. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87240-3_37
Chapter Google Scholar
Misawa, M., et al.: Development of a computer-aided detection system for colonoscopy and a publicly accessible large colonoscopy video database (with video). Gastrointest. Endosc. 93(4), 960–967 (2021)
Article Google Scholar
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. Neural Inf. Process. Syst. (2019)
Google Scholar
Qadir, H.A., Balasingham, I., Solhusvik, J., Bergsland, J., Aabakken, L., Shin, Y.: Improving automatic polyp detection using CNN by exploiting temporal dependency in colonoscopy video. IEEE J. Biomed. Health Inf. 24(1), 180–193 (2019)
Article Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Google Scholar
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Sun, P., et al.: Sparse R-CNN: end-to-end object detection with learnable proposals. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14454–14463 (2021)
Google Scholar
Tajbakhsh, N., Gurudu, S.R., Liang, J.: Automatic polyp detection in colonoscopy videos using an ensemble of convolutional neural networks. In: 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI), pp. 79–83. IEEE (2015)
Google Scholar
Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019)
Google Scholar
Wang, W., Zhou, T., Yu, F., Dai, J., Konukoglu, E., Van Gool, L.: Exploring cross-image pixel contrast for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7303–7313 (2021)
Google Scholar
Wu, L., Hu, Z., Ji, Y., Luo, P., Zhang, S.: Multi-frame collaboration for effective endoscopic video polyp detection via spatial-temporal feature transformation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12905, pp. 302–312. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87240-3_29
Chapter Google Scholar
Zhan, C., Duan, X., Xu, S., Song, Z., Luo, M.: An improved moving object detection algorithm based on frame difference and edge detection. In: Fourth International Conference on Image and Graphics (ICIG 2007), pp. 519–523 (2007)
Google Scholar
Zhang, H., et al.: DINO: DETR with improved denoising anchor boxes for end-to-end object detection. In: The Eleventh International Conference on Learning Representations (2022)
Google Scholar
Zhang, Z., et al.: Asynchronous in parallel detection and tracking (AIPDT): real-time robust polyp detection. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12263, pp. 722–731. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59716-0_69
Chapter Google Scholar
Zheng, H., Chen, H., Huang, J., Li, X., Han, X., Yao, J.: Polyp tracking in video colonoscopy using optical flow with an on-the-fly trained CNN. In: 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), pp. 79–82. IEEE (2019)
Google Scholar
Zhou, Q., et al.: Transvod: end-to-end video object detection with spatial-temporal transformers. IEEE Trans. Pattern Anal. Mach. Intell. (2022)
Google Scholar
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)
Zhu, X., Wang, Y., Dai, J., Yuan, L., Wei, Y.: Flow-guided feature aggregation for video object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 408–417 (2017)
Google Scholar

Download references

Acknowledgements

This work was supported in part by Shenzhen General Program No. JCYJ20220530143600001, by the Basic Research Project No. HZQB-KCZYZ-2021067 of Hetao Shenzhen HK S &T Cooperation Zone, by Shenzhen-Hong Kong Joint Funding No. SGDX20211123112401002, NSFC with Grant No. 62293482, by Shenzhen Outstanding Talents Training Fund, by Guangdong Research Project No. 2017ZT07X152 and No. 2019CX01X104, by the Guangdong Provincial Key Laboratory of Future Networks of Intelligence (Grant No. 2022B1212010001), by the Guangdong Provincial Key Laboratory of Big Data Computing, The Chinese University of Hong Kong, Shenzhen, by the NSFC 61931024 &81922046, by the Shenzhen Key Laboratory of Big Data and Artificial Intelligence (Grant No. ZDSYS201707251409055), and the Key Area R &D Program of Guangdong Province with grant No. 2018B030338001, by zelixir biotechnology company Fund, by Tencent Open Fund.

Author information

Authors and Affiliations

SSE, The Chinese University of Hong Kong, Shenzhen, China
Yuncheng Jiang, Zixun Zhang, Shuguang Cui & Zhen Li
FNii, The Chinese University of Hong Kong, Shenzhen, China
Yuncheng Jiang, Zixun Zhang, Shuguang Cui & Zhen Li
SDS, The Chinese University of Hong Kong, Shenzhen, China
Ruimao Zhang
Shenzhen Research Insititute of Big Data, Shenzhen, China
Yuncheng Jiang, Zixun Zhang & Zhen Li
School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
Guanbin Li

Authors

Yuncheng Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Zixun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ruimao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Guanbin Li
View author publications
You can also search for this author in PubMed Google Scholar
Shuguang Cui
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhen Li .

Editor information

Editors and Affiliations

Icahn School of Medicine, Mount Sinai, NYC, NY, USA, Tel Aviv University, Tel Aviv, Israel
Hayit Greenspan
Emory University, Atlanta, GA, USA
Anant Madabhushi
Queen’s University, Kingston, ON, Canada
Parvin Mousavi
The University of British Columbia, Vancouver, BC, Canada
Septimiu Salcudean
Yale University, New Haven, CT, USA
James Duncan
IBM Research, San Jose, CA, USA
Tanveer Syeda-Mahmood
Johns Hopkins University, Baltimore, MD, USA
Russell Taylor

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 634 KB)

Supplementary material 2 (mp4 3353 KB)

Supplementary material 3 (mp4 26319 KB)

Supplementary material 4 (pdf 596 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, Y., Zhang, Z., Zhang, R., Li, G., Cui, S., Li, Z. (2023). YONA: You Only Need One Adjacent Reference-Frame for Accurate and Fast Video Polyp Detection. In: Greenspan, H., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol 14224. Springer, Cham. https://doi.org/10.1007/978-3-031-43904-9_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-43904-9_5
Published: 01 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43903-2
Online ISBN: 978-3-031-43904-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

YONA: You Only Need One Adjacent Reference-Frame for Accurate and Fast Video Polyp Detection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

SALI: Short-Term Alignment and Long-Term Interaction Network for Colonoscopy Video Polyp Segmentation

PolypNextLSTM: a lightweight and fast polyp video segmentation network using ConvNext and ConvLSTM

LDPolypVideo Benchmark: A Large-Scale Colonoscopy Video Dataset of Diverse Polyps

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 4 (pdf 596 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

YONA: You Only Need One Adjacent Reference-Frame for Accurate and Fast Video Polyp Detection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

SALI: Short-Term Alignment and Long-Term Interaction Network for Colonoscopy Video Polyp Segmentation

PolypNextLSTM: a lightweight and fast polyp video segmentation network using ConvNext and ConvLSTM

LDPolypVideo Benchmark: A Large-Scale Colonoscopy Video Dataset of Diverse Polyps

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 4 (pdf 596 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation