Skip to main content
Log in

CGAN: lightweight and feature aggregation network for high-performance interactive image segmentation

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

In the task of interactive image segmentation, user interactions about the object of interest are accepted to predict the segmentation mask. Recent works have demonstrated state-of-the-art results by using either backpropagating refinement or iterative training scheme, which are computationally expensive. In this paper, we propose a novel method for interactive image segmentation using conditional generative adversarial networks to enforce higher-order consistency in the segmentation, without extra post-processing during inference. Concretely, we develop a new segmentation network which integrates three different modules by providing global contextual information and attentions and conducting feature fusions across multiple layers. This allows the segmentation network to learn strong object representations and predict more accurate segmentations. We then employ a fully convolutional discriminator to detect and correct higher-order inconsistency between the predictions of the segmentation network and the ground truth label maps. To achieve this, we optimize an objective function that combines the conventional segmentation loss with the adversarial loss of the adversarial term. We train our network on the Pascal VOC 2012 and MS COCO 2017 datasets and conduct comprehensive experiments on four benchmark datasets. Experimental results show that the adversarial training to the network architecture has improved segmentation results over state-of-the-art methods, while making the current system efficient in terms of speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

Data will be made available on reasonable request.

References

  1. Boykov, Y., Jolly, M.: Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In: Proceedings of the Eighth International Conference on Computer Vision (ICCV-01), Vancouver, British Columbia, Canada, July 7–14, 2001, vol. 1, pp. 105–112 (2001)

  2. Grady, L.J.: Random walks for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 28(11), 1768–1783 (2006)

    Article  PubMed  Google Scholar 

  3. Rother, C., Kolmogorov, V., Blake, A.: “grabcut’’: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23(3), 309–314 (2004)

    Article  Google Scholar 

  4. Gulshan, V., Rother, C., Criminisi, A., Blake, A., Zisserman, A.: Geodesic star convexity for interactive image segmentation. In: The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13–18 June 2010, pp. 3129–3136 (2010)

  5. Price, B.L., Morse, B.S., Cohen, S.: Geodesic graph cut for interactive image segmentation. In: The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13–18 June 2010, pp. 3161–3168 (2010)

  6. Cheng, M., Prisacariu, V.A., Zheng, S., Torr, P.H.S., Rother, C.: DenseCut: densely connected CRFs for realtime grabcut. Comput. Graph. Forum 34(7), 193–201 (2015)

    Article  Google Scholar 

  7. Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. In: Proceedings of the International Conference on Computer Vision, Kerkyra, Corfu, Greece, September 20–25, 1999, pp. 377–384 (1999)

  8. Kolmogorov, V., Zabih, R.: What energy functions can be minimized via graph cuts? IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 147–159 (2004)

    Article  PubMed  Google Scholar 

  9. He, X., Gould, S.: An exemplar-based CRF for multi-instance object segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23–28, 2014, pp. 296–303 (2014)

  10. Xu, N., Price, B.L., Cohen, S., Yang, J., Huang, T.S.: Deep interactive object selection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016, pp. 373–381 (2016)

  11. Liew, J.H., Wei, Y., Xiong, W., Ong, S.H., Feng, J.: Regional interactive image segmentation networks. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22–29, 2017, pp. 2746–2754 (2017)

  12. Maninis, K., Caelles, S., Pont-Tuset, J., Gool, L.V.: Deep extreme cut: from extreme points to object segmentation. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, 2018, pp. 616–625 (2018)

  13. Li, Z., Chen, Q., Koltun, V.: Interactive image segmentation with latent diversity. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, 2018, pp. 577–585 (2018)

  14. Hu, Y., Soltoggio, A., Lock, R., Carter, S.: A fully convolutional two-stream fusion network for interactive image segmentation. Neural Netw. 109, 31–42 (2019)

    Article  PubMed  Google Scholar 

  15. Lin, Z., Zhang, Z., Chen, L., Cheng, M., Lu, S.: Interactive image segmentation with first click attention. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020, pp. 13336–13345 (2020)

  16. Mahadevan, S., Voigtlaender, P., Leibe, B.: Iteratively trained interactive segmentation. In: British Machine Vision Conference 2018, BMVC 2018, Newcastle, UK, September 3–6, 2018, p. 212 (2018)

  17. Sofiiuk, K., Petrov, I.A., Konushin, A.: Reviving iterative training with mask guidance for interactive segmentation. In: 2022 IEEE International Conference on Image Processing, ICIP 2022, Bordeaux, France, 16–19 October 2022, pp. 3141–3145 (2022)

  18. Jang, W., Kim, C.: Interactive image segmentation via backpropagating refinement scheme. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, pp. 5297–5306 (2019)

  19. Sofiiuk, K., Petrov, I.A., Barinova, O., Konushin, A.: F-BRS: rethinking backpropagating refinement for interactive segmentation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020, pp. 8620–8629 (2020)

  20. Majumder, S., Yao, A.: Content-aware multi-level guidance for interactive instance segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, pp. 11602–11611 (2019)

  21. Acuna, D., Ling, H., Kar, A., Fidler, S.: Efficient interactive annotation of segmentation datasets with polygon-RNN++. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, 2018, pp. 859–868 (2018)

  22. Chen, X., Zhao, Z., Yu, F., Zhang, Y., Duan, M.: Conditional diffusion for interactive segmentation. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10–17, 2021, pp. 7325–7334 (2021)

  23. Hao, Y., Liu, Y., Wu, Z., Han, L., Chen, Y., Chen, G., Chu, L., Tang, S., Yu, Z., Chen, Z., Lai, B.: Edgeflow: achieving practical interactive segmentation with edge-guided flow. In: IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2021, Montreal, BC, Canada, October 11–17, 2021, pp. 1551–1560 (2021)

  24. Chen, X., Zhao, Z., Zhang, Y., Duan, M., Qi, D., Zhao, H.: Focalclick: towards practical interactive image segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18–24, 2022, pp. 1290–1299 (2022)

  25. Liu, Q., Xu, Z., Jiao, Y., Niethammer, M.: iSegFormer: interactive segmentation via transformers with application to 3D knee MR images. In: Medical Image Computing and Computer Assisted Intervention—MICCAI 2022—25th International Conference, Singapore, September 18–22, 2022, Proceedings, Part V, vol. 13435, pp. 464–474 (2022)

  26. Zhang, S., Liew, J.H., Wei, Y., Wei, S., Zhao, Y.: Interactive object segmentation with inside–outside guidance. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020, pp. 12231–12241 (2020)

  27. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A.C., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8–13 2014, Montreal, Quebec, Canada, pp. 2672–2680 (2014)

  28. Wang, L., Sun, Y., Wang, Z.: CCS-GAN: a semi-supervised generative adversarial network for image classification. Vis. Comput. 38(6), 2009–2021 (2022)

    Article  Google Scholar 

  29. Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A.P., Tejani, A., Totz, J., Wang, Z., Shi, W.: Photo-realistic single image super-resolution using a generative adversarial network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017, pp. 105–114 (2017)

  30. Luc, P., Couprie, C., Chintala, S., Verbeek, J.: Semantic segmentation using adversarial networks. CoRR arxiv:1611.08408 (2016)

  31. Souly, N., Spampinato, C., Shah, M.: Semi and weakly supervised semantic segmentation using generative adversarial network. CoRR arxiv:1703.09695 (2017)

  32. Hung, W., Tsai, Y., Liou, Y., Lin, Y., Yang, M.: Adversarial learning for semi-supervised semantic segmentation. In: British Machine Vision Conference 2018, BMVC 2018, Newcastle, UK, September 3–6, 2018, p. 65 (2018)

  33. Mirza, M., Osindero, S.: Conditional generative adversarial nets. CoRR arxiv:1411.1784 (2014)

  34. Reed, S.E., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19–24, 2016, vol. 48, pp. 1060–1069 (2016)

  35. Isola, P., Zhu, J., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017, pp. 5967–5976 (2017)

  36. Adachi, H., Fukui, H., Yamashita, T., Fujiyoshi, H.: Facial image generation by generative adversarial networks using weighted conditions. In: Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2019, Volume 4: VISAPP, Prague, Czech Republic, February 25–27, 2019, pp. 139–145 (2019)

  37. Ci, Y., Ma, X., Wang, Z., Li, H., Luo, Z.: User-guided deep anime line art colorization with conditional adversarial networks. In: 2018 ACM Multimedia Conference on Multimedia Conference, MM 2018, Seoul, Republic of Korea, October 22–26, 2018, pp. 1536–1544 (2018)

  38. Yoo, S., Bahng, H., Chung, S., Lee, J., Chang, J., Choo, J.: Coloring with limited data: few-shot colorization via memory augmented networks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, pp. 11283–11292 (2019)

  39. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015—18th International Conference Munich, Germany, October 5–9, 2015, Proceedings, Part III, vol. 9351, pp. 234–241 (2015)

  40. Chen, L., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder–decoder with atrous separable convolution for semantic image segmentation. In: Computer Vision—ECCV 2018—15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part VII, vol. 11211, pp. 833–851 (2018)

  41. Milletari, F., Navab, N., Ahmadi, S.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: Fourth International Conference on 3D Vision, 3DV 2016, Stanford, CA, USA, October 25–28, 2016, pp. 565–571 (2016)

  42. Gui, Y., Zhou, B., Zhang, J., Sun, C., Xiang, L., Zhang, J.: Learning interactive multi-object segmentation through appearance embedding and spatial attention. IET Image Process. 16(10), 2722–2737 (2022)

    Article  Google Scholar 

  43. Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J.M., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)

    Article  Google Scholar 

  44. Lin, T., Maire, M., Belongie, S.J., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Computer Vision—ECCV 2014—13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V, vol. 8693, pp. 740–755 (2014)

  45. McGuinness, K., O’Connor, N.E.: A comparative evaluation of interactive segmentation algorithms. Pattern Recogn. 43(2), 434–444 (2010)

    Article  ADS  Google Scholar 

  46. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings (2015)

Download references

Acknowledgements

We would like to thank the reviewers for their valuable comments. This work was supported by the National Natural Science Foundation of China (Project Nos. 62272164, 61972056, 61402053), the Hunan Provincial Natural Science Foundation of China (Grant No. 2021JJ30743) and the Scientific Research Fund of Education Department of Hunan Province (Grant No. 21B0287).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gui Yan.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yan, G., Zhengyan, Z., Zhihua, C. et al. CGAN: lightweight and feature aggregation network for high-performance interactive image segmentation. Vis Comput 40, 2203–2217 (2024). https://doi.org/10.1007/s00371-023-02911-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-02911-0

Keywords

Navigation