Abstract
With the development of artificial intelligence, autonomous driving has gradually attracted attentions from academia and industry. Detecting road conditions correctly and timely is essential to autonomous driving. Thus, we propose a flexible and parallel framework called RISAT for real-time instance segmentation. RISAT improves on YOLOv3 by adding a new parallel branch to generate masks. RISAT can produce a good performance on high-quality segmentation for each instance using GAN. Furthermore, we utilizes ROI class loss on both mask learning for each class and perceptual loss on detailed information. On the benchmark of MS COCO, the frame per second(FPS) of RISAT can achieve 43, which is much faster than that of MNC and FCIS. Besides, the average precision(AP) of RISAT is greater than the previous one-stage object detection method by 0.5.
Access this article
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.








References
Al-Qizwini M, Barjasteh I, Al-Qassab H, Radha H (2017) Deep learning algorithm for autonomous driving using googlenet. In: 2017 IEEE intelligent vehicles symposium (IV). IEEE, pp 89–96
Aqqa M, Shah S (2021) Car-dcgan: a deep convolutional generative adversarial network for compression artifact removal in video surveillance systems. In: 16th International conference on computer vision theory and applications
Bagloee S A, Tavana M, Asadi M, Oliver T (2016) Autonomous vehicles: challenges, opportunities, and future implications for transportation policies. J Modern Transp 24(4):284–303
Bolya D, Zhou C, Xiao F, Lee Y J (2019) Yolact: real-time instance segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 9157–9166
Chen L-C, Hermans A, Papandreou G, Schroff F, Wang P, Adam H (2018) Masklab: instance segmentation by refining object detection with semantic and direction features. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4013–4022
Chen L-C, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587
Dai J, He K, Sun J (2016) Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3150–3158
Feng D, Haase-Schütz C, Rosenbaum L, Hertlein H, Glaeser C, Timm F, Wiesbeck W, Dietmayer K (2020) Deep multi-modal object detection and semantic segmentation for autonomous driving: datasets, methods, and challenges. IEEE Trans Intell Transp Syst
Fu C-Y, Shvets M, Berg A C (2019) Retinamask: learning to predict masks improves state-of-the-art single-shot detection for free. arXiv:1901.03353
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hoermann S, Bach M, Dietmayer K (2018) Dynamic occupancy grid prediction for urban autonomous driving: a deep learning approach with fully automatic labeling. In: 2018 IEEE International conference on robotics and automation (ICRA). IEEE, pp 2056–2063
Huang G, Liu Z, Van Der Maaten L, Weinberger K Q (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Huang Z, Huang L, Gong Y, Huang C, Wang X (2019) Mask scoring r-cnn. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6409–6418
Jian L, Li Z, Yang X, Wu W, Ahmad A, Jeon G (2019) Combining unmanned aerial vehicles with artificial-intelligence technology for traffic-congestion recognition: electronic eyes in the skies to spot clogged roads. IEEE Consum Electron Mag 8(3):81–86
Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution. arXiv:1603.08155
Kawasaki A, Seki A (2021) Multimodal trajectory predictions for autonomous driving without a detailed prior map. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision (WACV), pp 3723–3732
Kim H, Choi Y, Kim J, Yoo S, Uh Y (2021) Stylemapgan: exploiting spatial dimensions of latent in gan for real-time image editing. arXiv:2104.14754
Kirillov A, Levinkov E, Andres B, Savchynskyy B, Rother C (2017) Instancecut: from edges to instances with multicut. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5008–5017
Kohli P, Torr Philip HS, et al. (2009) Robust higher order potentials for enforcing label consistency. Int J Comput Vis 82(3):302–324
Koul S, Kumar M, Khurana S S, Mushtaq F, Kumar K (2022) An efficient approach for copy-move image forgery detection using convolution neural network. Multimed Tools Appl 81:11259–11277
Krizhevsky A, Sutskever I, Hinton G E (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Lee Y, Park J (2020) Centermask: real-time anchor-free instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Li P, Chen X, Shen S (2019) Stereo r-cnn based 3d object detection for autonomous driving. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 7636–7644
Li Y, Qi H, Dai J, Ji X, Wei Y (2017) Fully convolutional instance-aware semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2359–2367
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C L (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, pp 740–755
Liu R, Ge Y, Choi C, Wang X, Li H (2021) Divco: diverse conditional image synthesis via contrastive generative adversarial network. arXiv:2103.07893
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg A C (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Liu Y, Zhang G, Zhang Y (2019) Vehicle detection method based on ade-yolov3 algorithm. In: 2019 4th International conference on intelligent informatics and biomedical sciences (ICIIBMS)
Luc P, Couprie C, Chintala S, Verbeek J (2016) Semantic segmentation using adversarial networks. arXiv:1611.08408
Miksys L, Jetley S, Sapienza M, Golodetz S, Torr P (2019) Straight to shapes++: real-time instance segmentation made more accurate. arXiv:1905.11358
Pei S, Shen T, Wang X, Gu C, Ning Z, Ye X, Xiong N (2020) 3Dacn: 3D augmented convolutional network for time series data. Inf Sci 513:17–29
Pei S, Tang F, Ji Y, Fan J, Zhong N (2018) Localized traffic sign detection with multi-scale deconvolution networks. In: Proceedings of the IEEE conference on computer software and applications, pp 355–360
Pinheiro Pedro OO, Collobert R, Dollár P (2015) Learning to segment object candidates. In: Advances in neural information processing systems, pp 1990–1998
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv:1804.02767
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Richardson E, Alaluf Y, Patashnik O, Nitzan Y, Azar Y, Shapiro S, Cohen-Or D (2020) Encoding in style: a stylegan encoder for image-to-image translation. arXiv:2008.00951
Sallab Ahmad EL, Abdou M, Perot E, Yogamani S (2017) Deep reinforcement learning framework for autonomous driving. Electron Imaging 2017 (19):70–76
Sarkar K, Liu L, Golyanik V, Theobalt C (2021) Humangan: a generative model of humans images. arXiv:2103.06902
Shaheed K, Mao A, Qureshi I, Kumar M, Hussain S, Ullah I, Zhang X (2022) Ds-cnn: a pre-trained xception model based on depth-wise separable convolutional neural network for finger vein recognition. Expert Syst Appl 191:116288. https://doi.org/10.1016/j.eswa.2021.116288, https://www.sciencedirect.com/science/article/pii/S0957417421015943
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Tan Z, Chai M, Chen D, Liao J, Chu Q, Liu B, Hua G, Yu N (2021) Diverse semantic image synthesis via probability distribution modeling. arXiv:2103.06878
Uřicař M, Sistu G, Rashed H, Vobecký A, Kumar V, Krízek P, Burger F, Yogamani S (2019) Let’s get dirty: Gan based data augmentation for camera lens soiling detection in autonomous driving
Xie E, Sun P, Song X, Wang W, Liu X, Liang D, Shen C, Luo P (2019) Polarmask: single shot instance segmentation with polar representation. arXiv:1909.13226
Yao J, Yu Z, Yu J, Tao D (2020) Single pixel reconstruction for one-stage instance segmentation. IEEE Transactions on Cybernetics
Zeng N, Li H, Wang Z, Liu W, Liu S, Alsaadi F E, Liu X (2021) Deep-reinforcement-learning-based images segmentation for quantitative analysis of gold immunochromatographic strip. Neurocomputing 425:173–180
Zhou C, Wu M, Lam S (2019) Ssa-cnn: aemantic self-attention cnn for pedestrian detection. arXiv:1902.09080
Acknowledgements
We would like to thank the anonymous reviewers for their invaluable comments.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
This work was partially funded by the National Natural Science Foundation of China under Grant (61975124), Shanghai Natural Science Foundation(20ZR1428600), the Open Project Program of Shanghai Key Laboratory of Data Science (NO.2020090600003), and the Open Project Funding from the State Key Lab of Computer Architecture, ICT, CAS under Grant CARCHA202111. Any opinions, findings and conclusions expressed in this paper are those of the authors and do not necessarily reflect the views of the sponsors.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pei, S., Ni, B., Shen, T. et al. RISAT: real-time instance segmentation with adversarial training. Multimed Tools Appl 82, 4063–4080 (2023). https://doi.org/10.1007/s11042-022-13447-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13447-1