Abstract
Interactive image segmentation can extract specific targets meeting users’ intention, and has received widespread attention in computer vision. The conventional interactive methods rely too much on the user interaction due to the limitation caused by the hand-crafted low-level features. Recently, deep interactive approaches have significantly improved the segmentation performance thanks to the semantic perception ability. However, in these approaches each interaction is generally treated independently by the same way, regardless of its own intention of each click and the potential relationships among the continuous interactions. The above defects still leads them restricted to the conflict between the interaction quantity and the interaction number. To overcome the above problem, this paper focuses on the click-based interactive segmentation task by explicitly mining the intention of each click and linking the relationships among all clicks. A selection-collection training framework is first established to impose the global object selection and the local error correction roles during the whole interaction process. Then a temporal network architecture is designed to continuously connect the entire click sequence. In this case, the respective role of each click can be played as much as possible, and the spatially varying segmentation cues can be propagated in time series. Experiments on the challenging SBD, GrabCut, DAVIS and Berkeley segmentation datasets demonstrate the effectiveness of the proposed method.
Similar content being viewed by others
Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Rother C, Kolmogorov V, Blake A (2004) “grabcut’’ interactive foreground extraction using iterated graph cuts. ACM Trans Gr (TOG) 23(3):309–314
Lempitsky V, Kohli P, Rother C, Sharp T (2009) Image segmentation with a bounding box prior. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 277–284. IEEE
Grady L (2006) Random walks for image segmentation. IEEE Trans Pattern Anal Mach Intell 28(11):1768–1783
Li Y, Sun J, Tang C-K, Shum H-Y (2004) Lazy snapping. ACM Trans Gr (ToG) 23(3):303–308
Wang T, Qi S, Ji Z, Sun Q, Fu P, Ge Q (2020) Error-tolerant label prior for interactive image segmentation. Inf Sci 538:384–395
Xu N, Price B, Cohen S, Yang J, Huang TS (2016) Deep interactive object selection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 373–381
Maninis K-K, Caelles S, Pont-Tuset J, Van Gool L (2018) Deep extreme cut: From extreme points to object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 616–625
Sakinis T, Milletari F, Roth H, Korfiatis P, Kostandy P, Philbrick K, Akkus Z, Xu Z, Xu D, Erickson BJ (2019) Interactive segmentation of medical images through fully convolutional neural networks. arXiv preprint arXiv:1903.08205
Girum KB, Créhange G, Hussain R, Lalande A (2020) Fast interactive medical image segmentation with weakly supervised deep learning method. Int J Comput Assist Radiol Surg 15(9):1437–1444
Boykov YY, Jolly M-P (2001) Interactive graph cuts for optimal boundary & region segmentation of objects in nd images. In: Proceedings Eighth IEEE international conference on computer vision. ICCV 2001, vol. 1, pp. 105–112. IEEE
Bai X, Sapiro G (2007) A geodesic framework for fast interactive image and video segmentation and matting. In: 2007 IEEE 11th international conference on computer vision, pp. 1–8. IEEE
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440
Ji Y, Zhang H, Zhang Z, Liu M (2021) Cnn-based encoder-decoder networks for salient object detection: a comprehensive review and recent advances. Inf Sci 546:835–857
Li Z, Chen Q, Koltun V (2018) Interactive image segmentation with latent diversity. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 577–585
Jang W-D, Kim C-S (2019) Interactive image segmentation via backpropagating refinement scheme. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5297–5306
Sofiiuk K, Petrov I, Barinova O, Konushin A (2020) f-brs: rethinking backpropagating refinement for interactive segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8623–8632
Zhang S, Liew JH, Wei Y, Wei S, Zhao Y (2020) Interactive object segmentation with inside-outside guidance. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12234–12244
Mahadevan S, Voigtlaender P, Leibe B (2018) Iteratively trained interactive segmentation. arXiv preprint arXiv:1805.04398
Forte M, Price B, Cohen S, Xu N, Pitié F (2020) Getting to 99% accuracy in interactive segmentation. arXiv preprint arXiv:2003.07932
Lin Z, Zhang Z, Chen L-Z, Cheng M-M, Lu S-P (2020) Interactive image segmentation with first click attention. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13339–13348
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Perazzi F, Pont-Tuset J, McWilliams B, Van Gool L, Gross M, Sorkine-Hornung A (2016) A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 724–732
McGuinness K, O’connor NE (2010) A comparative evaluation of interactive segmentation algorithms. Pattern Recognit 43(2):434–444
Grady L, Funka-Lea G (2004) Multi-label image segmentation for medical applications based on graph-theoretic electrical potentials. In: Computer vision and mathematical methods in medical and biomedical image analysis, pp. 230–245. Springer
Kim TH, Lee KM, Lee SU (2008) Generative image segmentation using random walks with restart. In: European Conference on Computer Vision, pp. 264–275. Springer
Dong X, Shen J, Shao L, Van Gool L (2015) Sub-markov random walk for image segmentation. IEEE Trans Image Process 25(2):516–527
Xu N, Price B, Cohen S, Yang J, Huang T (2017) Deep grabcut for object selection. arXiv preprint arXiv:1707.00243
Majumder S, Yao A (2019) Content-aware multi-level guidance for interactive instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 11602–11611
Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 7355–7363
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Adv Neural Inf Process Syst 27:3104–3112
Srivastava N, Mansimov E, Salakhudinov R (2015) Unsupervised learning of video representations using lstms. In: International conference on machine learning, pp. 843–852
Shi X, Chen Z, Wang H, Yeung D-Y, Wong W-K, Woo W-C (2015) Convolutional lstm network: a machine learning approach for precipitation nowcasting. Adva Neural Inf Process Syst 28:802–810
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778
Zhang Q, Bai C, Liu Z, Yang LT, Yu H, Zhao J, Yuan H (2020) A gpu-based residual network for medical image classification in smart medicine. Inf Sci 536:91–100
Guo J, He H, He T, Lausen L, Li M, Lin H, Shi X, Wang C, Xie J, Zha S, Zhang A, Zhang H, Zhang Z, Zhang Z, Zheng S, Zhu Y (2020) Gluoncv and gluonnlp: deep learning in computer vision and natural language processing. J Mach Learn Res 21(23):1–7
Gulshan V, Rother C, Criminisi A, Blake A, Zisserman A (2010) Geodesic star convexity for interactive image segmentation. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp. 3129–3136. IEEE
Liew J, Wei Y, Xiong W, Ong S-H, Feng J (2017) Regional interactive image segmentation networks. In: 2017 IEEE international conference on computer vision (ICCV), pp. 2746–2754. IEEE Computer Society
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China under Grants 62172221 and 62072241, and in part by the Fundamental Research Funds for the Central Universities under Grant NO. JSGP202204.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We declare that we have no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, Y., Wang, T., Ji, Z. et al. Spatiotemporal consistent selection-correction network for deep interactive image segmentation. Neural Comput & Applic 35, 9725–9738 (2023). https://doi.org/10.1007/s00521-023-08210-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08210-y