CGAN: lightweight and feature aggregation network for high-performance interactive image segmentation

Yan, Gui; Zhengyan, Zhang; Zhihua, Chen; Chuang, Zhang; Jin, Zhang

doi:10.1007/s00371-023-02911-0

CGAN: lightweight and feature aggregation network for high-performance interactive image segmentation

Original article
Published: 06 June 2023

Volume 40, pages 2203–2217, (2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Gui Yan ORCID: orcid.org/0000-0001-8323-4571^1,2,
Zhang Zhengyan^1,2,
Chen Zhihua³,
Zhang Chuang^1,2 &
…
Zhang Jin^1,2

245 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

In the task of interactive image segmentation, user interactions about the object of interest are accepted to predict the segmentation mask. Recent works have demonstrated state-of-the-art results by using either backpropagating refinement or iterative training scheme, which are computationally expensive. In this paper, we propose a novel method for interactive image segmentation using conditional generative adversarial networks to enforce higher-order consistency in the segmentation, without extra post-processing during inference. Concretely, we develop a new segmentation network which integrates three different modules by providing global contextual information and attentions and conducting feature fusions across multiple layers. This allows the segmentation network to learn strong object representations and predict more accurate segmentations. We then employ a fully convolutional discriminator to detect and correct higher-order inconsistency between the predictions of the segmentation network and the ground truth label maps. To achieve this, we optimize an objective function that combines the conventional segmentation loss with the adversarial loss of the adversarial term. We train our network on the Pascal VOC 2012 and MS COCO 2017 datasets and conduct comprehensive experiments on four benchmark datasets. Experimental results show that the adversarial training to the network architecture has improved segmentation results over state-of-the-art methods, while making the current system efficient in terms of speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object Segmentation Using Pixel-Wise Adversarial Loss

MS-Net: Mixed-Supervision Fully-Convolutional Networks for Full-Resolution Segmentation

LDANet: the laplace-guided detail-constrained asymmetric network for real-time semantic segmentation

Article 27 November 2023

Data availability

Data will be made available on reasonable request.

References

Boykov, Y., Jolly, M.: Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In: Proceedings of the Eighth International Conference on Computer Vision (ICCV-01), Vancouver, British Columbia, Canada, July 7–14, 2001, vol. 1, pp. 105–112 (2001)
Grady, L.J.: Random walks for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 28(11), 1768–1783 (2006)
Article PubMed Google Scholar
Rother, C., Kolmogorov, V., Blake, A.: “grabcut’’: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23(3), 309–314 (2004)
Article Google Scholar
Gulshan, V., Rother, C., Criminisi, A., Blake, A., Zisserman, A.: Geodesic star convexity for interactive image segmentation. In: The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13–18 June 2010, pp. 3129–3136 (2010)
Price, B.L., Morse, B.S., Cohen, S.: Geodesic graph cut for interactive image segmentation. In: The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13–18 June 2010, pp. 3161–3168 (2010)
Cheng, M., Prisacariu, V.A., Zheng, S., Torr, P.H.S., Rother, C.: DenseCut: densely connected CRFs for realtime grabcut. Comput. Graph. Forum 34(7), 193–201 (2015)
Article Google Scholar
Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. In: Proceedings of the International Conference on Computer Vision, Kerkyra, Corfu, Greece, September 20–25, 1999, pp. 377–384 (1999)
Kolmogorov, V., Zabih, R.: What energy functions can be minimized via graph cuts? IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 147–159 (2004)
Article PubMed Google Scholar
He, X., Gould, S.: An exemplar-based CRF for multi-instance object segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23–28, 2014, pp. 296–303 (2014)
Xu, N., Price, B.L., Cohen, S., Yang, J., Huang, T.S.: Deep interactive object selection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016, pp. 373–381 (2016)
Liew, J.H., Wei, Y., Xiong, W., Ong, S.H., Feng, J.: Regional interactive image segmentation networks. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22–29, 2017, pp. 2746–2754 (2017)
Maninis, K., Caelles, S., Pont-Tuset, J., Gool, L.V.: Deep extreme cut: from extreme points to object segmentation. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, 2018, pp. 616–625 (2018)
Li, Z., Chen, Q., Koltun, V.: Interactive image segmentation with latent diversity. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, 2018, pp. 577–585 (2018)
Hu, Y., Soltoggio, A., Lock, R., Carter, S.: A fully convolutional two-stream fusion network for interactive image segmentation. Neural Netw. 109, 31–42 (2019)
Article PubMed Google Scholar
Lin, Z., Zhang, Z., Chen, L., Cheng, M., Lu, S.: Interactive image segmentation with first click attention. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020, pp. 13336–13345 (2020)
Mahadevan, S., Voigtlaender, P., Leibe, B.: Iteratively trained interactive segmentation. In: British Machine Vision Conference 2018, BMVC 2018, Newcastle, UK, September 3–6, 2018, p. 212 (2018)
Sofiiuk, K., Petrov, I.A., Konushin, A.: Reviving iterative training with mask guidance for interactive segmentation. In: 2022 IEEE International Conference on Image Processing, ICIP 2022, Bordeaux, France, 16–19 October 2022, pp. 3141–3145 (2022)
Jang, W., Kim, C.: Interactive image segmentation via backpropagating refinement scheme. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, pp. 5297–5306 (2019)
Sofiiuk, K., Petrov, I.A., Barinova, O., Konushin, A.: F-BRS: rethinking backpropagating refinement for interactive segmentation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020, pp. 8620–8629 (2020)
Majumder, S., Yao, A.: Content-aware multi-level guidance for interactive instance segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, pp. 11602–11611 (2019)
Acuna, D., Ling, H., Kar, A., Fidler, S.: Efficient interactive annotation of segmentation datasets with polygon-RNN++. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, 2018, pp. 859–868 (2018)
Chen, X., Zhao, Z., Yu, F., Zhang, Y., Duan, M.: Conditional diffusion for interactive segmentation. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10–17, 2021, pp. 7325–7334 (2021)
Hao, Y., Liu, Y., Wu, Z., Han, L., Chen, Y., Chen, G., Chu, L., Tang, S., Yu, Z., Chen, Z., Lai, B.: Edgeflow: achieving practical interactive segmentation with edge-guided flow. In: IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2021, Montreal, BC, Canada, October 11–17, 2021, pp. 1551–1560 (2021)
Chen, X., Zhao, Z., Zhang, Y., Duan, M., Qi, D., Zhao, H.: Focalclick: towards practical interactive image segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18–24, 2022, pp. 1290–1299 (2022)
Liu, Q., Xu, Z., Jiao, Y., Niethammer, M.: iSegFormer: interactive segmentation via transformers with application to 3D knee MR images. In: Medical Image Computing and Computer Assisted Intervention—MICCAI 2022—25th International Conference, Singapore, September 18–22, 2022, Proceedings, Part V, vol. 13435, pp. 464–474 (2022)
Zhang, S., Liew, J.H., Wei, Y., Wei, S., Zhao, Y.: Interactive object segmentation with inside–outside guidance. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020, pp. 12231–12241 (2020)
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A.C., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8–13 2014, Montreal, Quebec, Canada, pp. 2672–2680 (2014)
Wang, L., Sun, Y., Wang, Z.: CCS-GAN: a semi-supervised generative adversarial network for image classification. Vis. Comput. 38(6), 2009–2021 (2022)
Article Google Scholar
Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A.P., Tejani, A., Totz, J., Wang, Z., Shi, W.: Photo-realistic single image super-resolution using a generative adversarial network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017, pp. 105–114 (2017)
Luc, P., Couprie, C., Chintala, S., Verbeek, J.: Semantic segmentation using adversarial networks. CoRR arxiv:1611.08408 (2016)
Souly, N., Spampinato, C., Shah, M.: Semi and weakly supervised semantic segmentation using generative adversarial network. CoRR arxiv:1703.09695 (2017)
Hung, W., Tsai, Y., Liou, Y., Lin, Y., Yang, M.: Adversarial learning for semi-supervised semantic segmentation. In: British Machine Vision Conference 2018, BMVC 2018, Newcastle, UK, September 3–6, 2018, p. 65 (2018)
Mirza, M., Osindero, S.: Conditional generative adversarial nets. CoRR arxiv:1411.1784 (2014)
Reed, S.E., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19–24, 2016, vol. 48, pp. 1060–1069 (2016)
Isola, P., Zhu, J., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017, pp. 5967–5976 (2017)
Adachi, H., Fukui, H., Yamashita, T., Fujiyoshi, H.: Facial image generation by generative adversarial networks using weighted conditions. In: Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2019, Volume 4: VISAPP, Prague, Czech Republic, February 25–27, 2019, pp. 139–145 (2019)
Ci, Y., Ma, X., Wang, Z., Li, H., Luo, Z.: User-guided deep anime line art colorization with conditional adversarial networks. In: 2018 ACM Multimedia Conference on Multimedia Conference, MM 2018, Seoul, Republic of Korea, October 22–26, 2018, pp. 1536–1544 (2018)
Yoo, S., Bahng, H., Chung, S., Lee, J., Chang, J., Choo, J.: Coloring with limited data: few-shot colorization via memory augmented networks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, pp. 11283–11292 (2019)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015—18th International Conference Munich, Germany, October 5–9, 2015, Proceedings, Part III, vol. 9351, pp. 234–241 (2015)
Chen, L., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder–decoder with atrous separable convolution for semantic image segmentation. In: Computer Vision—ECCV 2018—15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part VII, vol. 11211, pp. 833–851 (2018)
Milletari, F., Navab, N., Ahmadi, S.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: Fourth International Conference on 3D Vision, 3DV 2016, Stanford, CA, USA, October 25–28, 2016, pp. 565–571 (2016)
Gui, Y., Zhou, B., Zhang, J., Sun, C., Xiang, L., Zhang, J.: Learning interactive multi-object segmentation through appearance embedding and spatial attention. IET Image Process. 16(10), 2722–2737 (2022)
Article Google Scholar
Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J.M., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Article Google Scholar
Lin, T., Maire, M., Belongie, S.J., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Computer Vision—ECCV 2014—13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V, vol. 8693, pp. 740–755 (2014)
McGuinness, K., O’Connor, N.E.: A comparative evaluation of interactive segmentation algorithms. Pattern Recogn. 43(2), 434–444 (2010)
Article ADS Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings (2015)

Download references

Acknowledgements

We would like to thank the reviewers for their valuable comments. This work was supported by the National Natural Science Foundation of China (Project Nos. 62272164, 61972056, 61402053), the Hunan Provincial Natural Science Foundation of China (Grant No. 2021JJ30743) and the Scientific Research Fund of Education Department of Hunan Province (Grant No. 21B0287).

Author information

Authors and Affiliations

School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha, 410114, Hunan, China
Gui Yan, Zhang Zhengyan, Zhang Chuang & Zhang Jin
Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha University of Science and Technology, Changsha, 410114, Hunan, China
Gui Yan, Zhang Zhengyan, Zhang Chuang & Zhang Jin
Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, 200237, Shanghai, China
Chen Zhihua

Authors

Gui Yan
View author publications
You can also search for this author in PubMed Google Scholar
Zhang Zhengyan
View author publications
You can also search for this author in PubMed Google Scholar
Chen Zhihua
View author publications
You can also search for this author in PubMed Google Scholar
Zhang Chuang
View author publications
You can also search for this author in PubMed Google Scholar
Zhang Jin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gui Yan.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yan, G., Zhengyan, Z., Zhihua, C. et al. CGAN: lightweight and feature aggregation network for high-performance interactive image segmentation. Vis Comput 40, 2203–2217 (2024). https://doi.org/10.1007/s00371-023-02911-0

Download citation

Accepted: 15 May 2023
Published: 06 June 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s00371-023-02911-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CGAN: lightweight and feature aggregation network for high-performance interactive image segmentation

Abstract

Access this article

Similar content being viewed by others

Object Segmentation Using Pixel-Wise Adversarial Loss

MS-Net: Mixed-Supervision Fully-Convolutional Networks for Full-Resolution Segmentation

LDANet: the laplace-guided detail-constrained asymmetric network for real-time semantic segmentation

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

CGAN: lightweight and feature aggregation network for high-performance interactive image segmentation

Abstract

Access this article

Similar content being viewed by others

Object Segmentation Using Pixel-Wise Adversarial Loss

MS-Net: Mixed-Supervision Fully-Convolutional Networks for Full-Resolution Segmentation

LDANet: the laplace-guided detail-constrained asymmetric network for real-time semantic segmentation

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation