RISAT: real-time instance segmentation with adversarial training

Pei, Songwen; Ni, Bo; Shen, Tianma; Zhou, Zhenling; Chen, Yewang; Qiu, Meikang

doi:10.1007/s11042-022-13447-1

RISAT: real-time instance segmentation with adversarial training

Published: 21 July 2022

Volume 82, pages 4063–4080, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Songwen Pei ORCID: orcid.org/0000-0003-0810-1458^1,2,3,
Bo Ni¹,
Tianma Shen⁴,
Zhenling Zhou¹,
Yewang Chen⁵ &
…
Meikang Qiu⁶

364 Accesses
Explore all metrics

Abstract

With the development of artificial intelligence, autonomous driving has gradually attracted attentions from academia and industry. Detecting road conditions correctly and timely is essential to autonomous driving. Thus, we propose a flexible and parallel framework called RISAT for real-time instance segmentation. RISAT improves on YOLOv3 by adding a new parallel branch to generate masks. RISAT can produce a good performance on high-quality segmentation for each instance using GAN. Furthermore, we utilizes ROI class loss on both mask learning for each class and perceptual loss on detailed information. On the benchmark of MS COCO, the frame per second(FPS) of RISAT can achieve 43, which is much faster than that of MNC and FCIS. Besides, the average precision(AP) of RISAT is greater than the previous one-stage object detection method by 0.5.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

References

Al-Qizwini M, Barjasteh I, Al-Qassab H, Radha H (2017) Deep learning algorithm for autonomous driving using googlenet. In: 2017 IEEE intelligent vehicles symposium (IV). IEEE, pp 89–96
Aqqa M, Shah S (2021) Car-dcgan: a deep convolutional generative adversarial network for compression artifact removal in video surveillance systems. In: 16th International conference on computer vision theory and applications
Bagloee S A, Tavana M, Asadi M, Oliver T (2016) Autonomous vehicles: challenges, opportunities, and future implications for transportation policies. J Modern Transp 24(4):284–303
Article Google Scholar
Bolya D, Zhou C, Xiao F, Lee Y J (2019) Yolact: real-time instance segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 9157–9166
Chen L-C, Hermans A, Papandreou G, Schroff F, Wang P, Adam H (2018) Masklab: instance segmentation by refining object detection with semantic and direction features. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4013–4022
Chen L-C, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587
Dai J, He K, Sun J (2016) Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3150–3158
Feng D, Haase-Schütz C, Rosenbaum L, Hertlein H, Glaeser C, Timm F, Wiesbeck W, Dietmayer K (2020) Deep multi-modal object detection and semantic segmentation for autonomous driving: datasets, methods, and challenges. IEEE Trans Intell Transp Syst
Fu C-Y, Shvets M, Berg A C (2019) Retinamask: learning to predict masks improves state-of-the-art single-shot detection for free. arXiv:1901.03353
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hoermann S, Bach M, Dietmayer K (2018) Dynamic occupancy grid prediction for urban autonomous driving: a deep learning approach with fully automatic labeling. In: 2018 IEEE International conference on robotics and automation (ICRA). IEEE, pp 2056–2063
Huang G, Liu Z, Van Der Maaten L, Weinberger K Q (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Huang Z, Huang L, Gong Y, Huang C, Wang X (2019) Mask scoring r-cnn. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6409–6418
Jian L, Li Z, Yang X, Wu W, Ahmad A, Jeon G (2019) Combining unmanned aerial vehicles with artificial-intelligence technology for traffic-congestion recognition: electronic eyes in the skies to spot clogged roads. IEEE Consum Electron Mag 8(3):81–86
Article Google Scholar
Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution. arXiv:1603.08155
Kawasaki A, Seki A (2021) Multimodal trajectory predictions for autonomous driving without a detailed prior map. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision (WACV), pp 3723–3732
Kim H, Choi Y, Kim J, Yoo S, Uh Y (2021) Stylemapgan: exploiting spatial dimensions of latent in gan for real-time image editing. arXiv:2104.14754
Kirillov A, Levinkov E, Andres B, Savchynskyy B, Rother C (2017) Instancecut: from edges to instances with multicut. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5008–5017
Kohli P, Torr Philip HS, et al. (2009) Robust higher order potentials for enforcing label consistency. Int J Comput Vis 82(3):302–324
Article Google Scholar
Koul S, Kumar M, Khurana S S, Mushtaq F, Kumar K (2022) An efficient approach for copy-move image forgery detection using convolution neural network. Multimed Tools Appl 81:11259–11277
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton G E (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Lee Y, Park J (2020) Centermask: real-time anchor-free instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Li P, Chen X, Shen S (2019) Stereo r-cnn based 3d object detection for autonomous driving. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 7636–7644
Li Y, Qi H, Dai J, Ji X, Wei Y (2017) Fully convolutional instance-aware semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2359–2367
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C L (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, pp 740–755
Liu R, Ge Y, Choi C, Wang X, Li H (2021) Divco: diverse conditional image synthesis via contrastive generative adversarial network. arXiv:2103.07893
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg A C (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Liu Y, Zhang G, Zhang Y (2019) Vehicle detection method based on ade-yolov3 algorithm. In: 2019 4th International conference on intelligent informatics and biomedical sciences (ICIIBMS)
Luc P, Couprie C, Chintala S, Verbeek J (2016) Semantic segmentation using adversarial networks. arXiv:1611.08408
Miksys L, Jetley S, Sapienza M, Golodetz S, Torr P (2019) Straight to shapes++: real-time instance segmentation made more accurate. arXiv:1905.11358
Pei S, Shen T, Wang X, Gu C, Ning Z, Ye X, Xiong N (2020) 3Dacn: 3D augmented convolutional network for time series data. Inf Sci 513:17–29
Article Google Scholar
Pei S, Tang F, Ji Y, Fan J, Zhong N (2018) Localized traffic sign detection with multi-scale deconvolution networks. In: Proceedings of the IEEE conference on computer software and applications, pp 355–360
Pinheiro Pedro OO, Collobert R, Dollár P (2015) Learning to segment object candidates. In: Advances in neural information processing systems, pp 1990–1998
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv:1804.02767
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Richardson E, Alaluf Y, Patashnik O, Nitzan Y, Azar Y, Shapiro S, Cohen-Or D (2020) Encoding in style: a stylegan encoder for image-to-image translation. arXiv:2008.00951
Sallab Ahmad EL, Abdou M, Perot E, Yogamani S (2017) Deep reinforcement learning framework for autonomous driving. Electron Imaging 2017 (19):70–76
Article Google Scholar
Sarkar K, Liu L, Golyanik V, Theobalt C (2021) Humangan: a generative model of humans images. arXiv:2103.06902
Shaheed K, Mao A, Qureshi I, Kumar M, Hussain S, Ullah I, Zhang X (2022) Ds-cnn: a pre-trained xception model based on depth-wise separable convolutional neural network for finger vein recognition. Expert Syst Appl 191:116288. https://doi.org/10.1016/j.eswa.2021.116288, https://www.sciencedirect.com/science/article/pii/S0957417421015943
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Tan Z, Chai M, Chen D, Liao J, Chu Q, Liu B, Hua G, Yu N (2021) Diverse semantic image synthesis via probability distribution modeling. arXiv:2103.06878
Uřicař M, Sistu G, Rashed H, Vobecký A, Kumar V, Krízek P, Burger F, Yogamani S (2019) Let’s get dirty: Gan based data augmentation for camera lens soiling detection in autonomous driving
Xie E, Sun P, Song X, Wang W, Liu X, Liang D, Shen C, Luo P (2019) Polarmask: single shot instance segmentation with polar representation. arXiv:1909.13226
Yao J, Yu Z, Yu J, Tao D (2020) Single pixel reconstruction for one-stage instance segmentation. IEEE Transactions on Cybernetics
Zeng N, Li H, Wang Z, Liu W, Liu S, Alsaadi F E, Liu X (2021) Deep-reinforcement-learning-based images segmentation for quantitative analysis of gold immunochromatographic strip. Neurocomputing 425:173–180
Article Google Scholar
Zhou C, Wu M, Lam S (2019) Ssa-cnn: aemantic self-attention cnn for pedestrian detection. arXiv:1902.09080

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their invaluable comments.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
Songwen Pei, Bo Ni & Zhenling Zhou
State Key Lab of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
Songwen Pei
Shanghai Key Laboratory of Data Science, Fudan University, Shanghai, 200433, China
Songwen Pei
Department of Computer Science and Engineering, Santa Clara University, Santa Clara, 95053, CA, USA
Tianma Shen
College of Computer Science and Technology, Huaqiao University, Xiamen, 361021, China
Yewang Chen
Department of Computer Science, Texas A & M University-Commerce, Commerce, TX, USA
Meikang Qiu

Authors

Songwen Pei
View author publications
You can also search for this author in PubMed Google Scholar
Bo Ni
View author publications
You can also search for this author in PubMed Google Scholar
Tianma Shen
View author publications
You can also search for this author in PubMed Google Scholar
Zhenling Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yewang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Meikang Qiu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Songwen Pei.

Ethics declarations

Conflict of Interests

This work was partially funded by the National Natural Science Foundation of China under Grant (61975124), Shanghai Natural Science Foundation(20ZR1428600), the Open Project Program of Shanghai Key Laboratory of Data Science (NO.2020090600003), and the Open Project Funding from the State Key Lab of Computer Architecture, ICT, CAS under Grant CARCHA202111. Any opinions, findings and conclusions expressed in this paper are those of the authors and do not necessarily reflect the views of the sponsors.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pei, S., Ni, B., Shen, T. et al. RISAT: real-time instance segmentation with adversarial training. Multimed Tools Appl 82, 4063–4080 (2023). https://doi.org/10.1007/s11042-022-13447-1

Download citation

Received: 05 July 2021
Revised: 04 May 2022
Accepted: 02 July 2022
Published: 21 July 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s11042-022-13447-1

Keywords

Access this article

Log in via an institution

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

RISAT: real-time instance segmentation with adversarial training

Abstract

Access this article

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords