Skip to main content
Log in

Spatiotemporal consistent selection-correction network for deep interactive image segmentation

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Interactive image segmentation can extract specific targets meeting users’ intention, and has received widespread attention in computer vision. The conventional interactive methods rely too much on the user interaction due to the limitation caused by the hand-crafted low-level features. Recently, deep interactive approaches have significantly improved the segmentation performance thanks to the semantic perception ability. However, in these approaches each interaction is generally treated independently by the same way, regardless of its own intention of each click and the potential relationships among the continuous interactions. The above defects still leads them restricted to the conflict between the interaction quantity and the interaction number. To overcome the above problem, this paper focuses on the click-based interactive segmentation task by explicitly mining the intention of each click and linking the relationships among all clicks. A selection-collection training framework is first established to impose the global object selection and the local error correction roles during the whole interaction process. Then a temporal network architecture is designed to continuously connect the entire click sequence. In this case, the respective role of each click can be played as much as possible, and the spatially varying segmentation cues can be propagated in time series. Experiments on the challenging SBD, GrabCut, DAVIS and Berkeley segmentation datasets demonstrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Rother C, Kolmogorov V, Blake A (2004) “grabcut’’ interactive foreground extraction using iterated graph cuts. ACM Trans Gr (TOG) 23(3):309–314

    Article  Google Scholar 

  2. Lempitsky V, Kohli P, Rother C, Sharp T (2009) Image segmentation with a bounding box prior. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 277–284. IEEE

  3. Grady L (2006) Random walks for image segmentation. IEEE Trans Pattern Anal Mach Intell 28(11):1768–1783

    Article  Google Scholar 

  4. Li Y, Sun J, Tang C-K, Shum H-Y (2004) Lazy snapping. ACM Trans Gr (ToG) 23(3):303–308

    Article  Google Scholar 

  5. Wang T, Qi S, Ji Z, Sun Q, Fu P, Ge Q (2020) Error-tolerant label prior for interactive image segmentation. Inf Sci 538:384–395

    Article  MathSciNet  Google Scholar 

  6. Xu N, Price B, Cohen S, Yang J, Huang TS (2016) Deep interactive object selection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 373–381

  7. Maninis K-K, Caelles S, Pont-Tuset J, Van Gool L (2018) Deep extreme cut: From extreme points to object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 616–625

  8. Sakinis T, Milletari F, Roth H, Korfiatis P, Kostandy P, Philbrick K, Akkus Z, Xu Z, Xu D, Erickson BJ (2019) Interactive segmentation of medical images through fully convolutional neural networks. arXiv preprint arXiv:1903.08205

  9. Girum KB, Créhange G, Hussain R, Lalande A (2020) Fast interactive medical image segmentation with weakly supervised deep learning method. Int J Comput Assist Radiol Surg 15(9):1437–1444

    Article  Google Scholar 

  10. Boykov YY, Jolly M-P (2001) Interactive graph cuts for optimal boundary & region segmentation of objects in nd images. In: Proceedings Eighth IEEE international conference on computer vision. ICCV 2001, vol. 1, pp. 105–112. IEEE

  11. Bai X, Sapiro G (2007) A geodesic framework for fast interactive image and video segmentation and matting. In: 2007 IEEE 11th international conference on computer vision, pp. 1–8. IEEE

  12. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440

  13. Ji Y, Zhang H, Zhang Z, Liu M (2021) Cnn-based encoder-decoder networks for salient object detection: a comprehensive review and recent advances. Inf Sci 546:835–857

    Article  MathSciNet  Google Scholar 

  14. Li Z, Chen Q, Koltun V (2018) Interactive image segmentation with latent diversity. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 577–585

  15. Jang W-D, Kim C-S (2019) Interactive image segmentation via backpropagating refinement scheme. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5297–5306

  16. Sofiiuk K, Petrov I, Barinova O, Konushin A (2020) f-brs: rethinking backpropagating refinement for interactive segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8623–8632

  17. Zhang S, Liew JH, Wei Y, Wei S, Zhao Y (2020) Interactive object segmentation with inside-outside guidance. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12234–12244

  18. Mahadevan S, Voigtlaender P, Leibe B (2018) Iteratively trained interactive segmentation. arXiv preprint arXiv:1805.04398

  19. Forte M, Price B, Cohen S, Xu N, Pitié F (2020) Getting to 99% accuracy in interactive segmentation. arXiv preprint arXiv:2003.07932

  20. Lin Z, Zhang Z, Chen L-Z, Cheng M-M, Lu S-P (2020) Interactive image segmentation with first click attention. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13339–13348

  21. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  22. Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  23. Perazzi F, Pont-Tuset J, McWilliams B, Van Gool L, Gross M, Sorkine-Hornung A (2016) A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 724–732

  24. McGuinness K, O’connor NE (2010) A comparative evaluation of interactive segmentation algorithms. Pattern Recognit 43(2):434–444

    Article  MATH  Google Scholar 

  25. Grady L, Funka-Lea G (2004) Multi-label image segmentation for medical applications based on graph-theoretic electrical potentials. In: Computer vision and mathematical methods in medical and biomedical image analysis, pp. 230–245. Springer

  26. Kim TH, Lee KM, Lee SU (2008) Generative image segmentation using random walks with restart. In: European Conference on Computer Vision, pp. 264–275. Springer

  27. Dong X, Shen J, Shao L, Van Gool L (2015) Sub-markov random walk for image segmentation. IEEE Trans Image Process 25(2):516–527

    Article  MathSciNet  MATH  Google Scholar 

  28. Xu N, Price B, Cohen S, Yang J, Huang T (2017) Deep grabcut for object selection. arXiv preprint arXiv:1707.00243

  29. Majumder S, Yao A (2019) Content-aware multi-level guidance for interactive instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 11602–11611

  30. Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 7355–7363

  31. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Adv Neural Inf Process Syst 27:3104–3112

    Google Scholar 

  32. Srivastava N, Mansimov E, Salakhudinov R (2015) Unsupervised learning of video representations using lstms. In: International conference on machine learning, pp. 843–852

  33. Shi X, Chen Z, Wang H, Yeung D-Y, Wong W-K, Woo W-C (2015) Convolutional lstm network: a machine learning approach for precipitation nowcasting. Adva Neural Inf Process Syst 28:802–810

    Google Scholar 

  34. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778

  35. Zhang Q, Bai C, Liu Z, Yang LT, Yu H, Zhao J, Yuan H (2020) A gpu-based residual network for medical image classification in smart medicine. Inf Sci 536:91–100

    Article  MathSciNet  Google Scholar 

  36. Guo J, He H, He T, Lausen L, Li M, Lin H, Shi X, Wang C, Xie J, Zha S, Zhang A, Zhang H, Zhang Z, Zhang Z, Zheng S, Zhu Y (2020) Gluoncv and gluonnlp: deep learning in computer vision and natural language processing. J Mach Learn Res 21(23):1–7

    Google Scholar 

  37. Gulshan V, Rother C, Criminisi A, Blake A, Zisserman A (2010) Geodesic star convexity for interactive image segmentation. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp. 3129–3136. IEEE

  38. Liew J, Wei Y, Xiong W, Ong S-H, Feng J (2017) Regional interactive image segmentation networks. In: 2017 IEEE international conference on computer vision (ICCV), pp. 2746–2754. IEEE Computer Society

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grants 62172221 and 62072241, and in part by the Fundamental Research Funds for the Central Universities under Grant NO. JSGP202204.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Wang.

Ethics declarations

Conflict of interest

We declare that we have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Wang, T., Ji, Z. et al. Spatiotemporal consistent selection-correction network for deep interactive image segmentation. Neural Comput & Applic 35, 9725–9738 (2023). https://doi.org/10.1007/s00521-023-08210-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08210-y

Keywords

Navigation