Skip to main content
Log in

RISAT: real-time instance segmentation with adversarial training

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

With the development of artificial intelligence, autonomous driving has gradually attracted attentions from academia and industry. Detecting road conditions correctly and timely is essential to autonomous driving. Thus, we propose a flexible and parallel framework called RISAT for real-time instance segmentation. RISAT improves on YOLOv3 by adding a new parallel branch to generate masks. RISAT can produce a good performance on high-quality segmentation for each instance using GAN. Furthermore, we utilizes ROI class loss on both mask learning for each class and perceptual loss on detailed information. On the benchmark of MS COCO, the frame per second(FPS) of RISAT can achieve 43, which is much faster than that of MNC and FCIS. Besides, the average precision(AP) of RISAT is greater than the previous one-stage object detection method by 0.5.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Al-Qizwini M, Barjasteh I, Al-Qassab H, Radha H (2017) Deep learning algorithm for autonomous driving using googlenet. In: 2017 IEEE intelligent vehicles symposium (IV). IEEE, pp 89–96

  2. Aqqa M, Shah S (2021) Car-dcgan: a deep convolutional generative adversarial network for compression artifact removal in video surveillance systems. In: 16th International conference on computer vision theory and applications

  3. Bagloee S A, Tavana M, Asadi M, Oliver T (2016) Autonomous vehicles: challenges, opportunities, and future implications for transportation policies. J Modern Transp 24(4):284–303

    Article  Google Scholar 

  4. Bolya D, Zhou C, Xiao F, Lee Y J (2019) Yolact: real-time instance segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 9157–9166

  5. Chen L-C, Hermans A, Papandreou G, Schroff F, Wang P, Adam H (2018) Masklab: instance segmentation by refining object detection with semantic and direction features. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4013–4022

  6. Chen L-C, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587

  7. Dai J, He K, Sun J (2016) Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3150–3158

  8. Feng D, Haase-Schütz C, Rosenbaum L, Hertlein H, Glaeser C, Timm F, Wiesbeck W, Dietmayer K (2020) Deep multi-modal object detection and semantic segmentation for autonomous driving: datasets, methods, and challenges. IEEE Trans Intell Transp Syst

  9. Fu C-Y, Shvets M, Berg A C (2019) Retinamask: learning to predict masks improves state-of-the-art single-shot detection for free. arXiv:1901.03353

  10. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

  11. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  12. Hoermann S, Bach M, Dietmayer K (2018) Dynamic occupancy grid prediction for urban autonomous driving: a deep learning approach with fully automatic labeling. In: 2018 IEEE International conference on robotics and automation (ICRA). IEEE, pp 2056–2063

  13. Huang G, Liu Z, Van Der Maaten L, Weinberger K Q (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  14. Huang Z, Huang L, Gong Y, Huang C, Wang X (2019) Mask scoring r-cnn. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6409–6418

  15. Jian L, Li Z, Yang X, Wu W, Ahmad A, Jeon G (2019) Combining unmanned aerial vehicles with artificial-intelligence technology for traffic-congestion recognition: electronic eyes in the skies to spot clogged roads. IEEE Consum Electron Mag 8(3):81–86

    Article  Google Scholar 

  16. Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution. arXiv:1603.08155

  17. Kawasaki A, Seki A (2021) Multimodal trajectory predictions for autonomous driving without a detailed prior map. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision (WACV), pp 3723–3732

  18. Kim H, Choi Y, Kim J, Yoo S, Uh Y (2021) Stylemapgan: exploiting spatial dimensions of latent in gan for real-time image editing. arXiv:2104.14754

  19. Kirillov A, Levinkov E, Andres B, Savchynskyy B, Rother C (2017) Instancecut: from edges to instances with multicut. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5008–5017

  20. Kohli P, Torr Philip HS, et al. (2009) Robust higher order potentials for enforcing label consistency. Int J Comput Vis 82(3):302–324

    Article  Google Scholar 

  21. Koul S, Kumar M, Khurana S S, Mushtaq F, Kumar K (2022) An efficient approach for copy-move image forgery detection using convolution neural network. Multimed Tools Appl 81:11259–11277

    Article  Google Scholar 

  22. Krizhevsky A, Sutskever I, Hinton G E (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  23. Lee Y, Park J (2020) Centermask: real-time anchor-free instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  24. Li P, Chen X, Shen S (2019) Stereo r-cnn based 3d object detection for autonomous driving. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 7636–7644

  25. Li Y, Qi H, Dai J, Ji X, Wei Y (2017) Fully convolutional instance-aware semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2359–2367

  26. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C L (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, pp 740–755

  27. Liu R, Ge Y, Choi C, Wang X, Li H (2021) Divco: diverse conditional image synthesis via contrastive generative adversarial network. arXiv:2103.07893

  28. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg A C (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37

  29. Liu Y, Zhang G, Zhang Y (2019) Vehicle detection method based on ade-yolov3 algorithm. In: 2019 4th International conference on intelligent informatics and biomedical sciences (ICIIBMS)

  30. Luc P, Couprie C, Chintala S, Verbeek J (2016) Semantic segmentation using adversarial networks. arXiv:1611.08408

  31. Miksys L, Jetley S, Sapienza M, Golodetz S, Torr P (2019) Straight to shapes++: real-time instance segmentation made more accurate. arXiv:1905.11358

  32. Pei S, Shen T, Wang X, Gu C, Ning Z, Ye X, Xiong N (2020) 3Dacn: 3D augmented convolutional network for time series data. Inf Sci 513:17–29

    Article  Google Scholar 

  33. Pei S, Tang F, Ji Y, Fan J, Zhong N (2018) Localized traffic sign detection with multi-scale deconvolution networks. In: Proceedings of the IEEE conference on computer software and applications, pp 355–360

  34. Pinheiro Pedro OO, Collobert R, Dollár P (2015) Learning to segment object candidates. In: Advances in neural information processing systems, pp 1990–1998

  35. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv:1804.02767

  36. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  37. Richardson E, Alaluf Y, Patashnik O, Nitzan Y, Azar Y, Shapiro S, Cohen-Or D (2020) Encoding in style: a stylegan encoder for image-to-image translation. arXiv:2008.00951

  38. Sallab Ahmad EL, Abdou M, Perot E, Yogamani S (2017) Deep reinforcement learning framework for autonomous driving. Electron Imaging 2017 (19):70–76

    Article  Google Scholar 

  39. Sarkar K, Liu L, Golyanik V, Theobalt C (2021) Humangan: a generative model of humans images. arXiv:2103.06902

  40. Shaheed K, Mao A, Qureshi I, Kumar M, Hussain S, Ullah I, Zhang X (2022) Ds-cnn: a pre-trained xception model based on depth-wise separable convolutional neural network for finger vein recognition. Expert Syst Appl 191:116288. https://doi.org/10.1016/j.eswa.2021.116288, https://www.sciencedirect.com/science/article/pii/S0957417421015943

    Article  Google Scholar 

  41. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  42. Tan Z, Chai M, Chen D, Liao J, Chu Q, Liu B, Hua G, Yu N (2021) Diverse semantic image synthesis via probability distribution modeling. arXiv:2103.06878

  43. Uřicař M, Sistu G, Rashed H, Vobecký A, Kumar V, Krízek P, Burger F, Yogamani S (2019) Let’s get dirty: Gan based data augmentation for camera lens soiling detection in autonomous driving

  44. Xie E, Sun P, Song X, Wang W, Liu X, Liang D, Shen C, Luo P (2019) Polarmask: single shot instance segmentation with polar representation. arXiv:1909.13226

  45. Yao J, Yu Z, Yu J, Tao D (2020) Single pixel reconstruction for one-stage instance segmentation. IEEE Transactions on Cybernetics

  46. Zeng N, Li H, Wang Z, Liu W, Liu S, Alsaadi F E, Liu X (2021) Deep-reinforcement-learning-based images segmentation for quantitative analysis of gold immunochromatographic strip. Neurocomputing 425:173–180

    Article  Google Scholar 

  47. Zhou C, Wu M, Lam S (2019) Ssa-cnn: aemantic self-attention cnn for pedestrian detection. arXiv:1902.09080

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their invaluable comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Songwen Pei.

Ethics declarations

Conflict of Interests

This work was partially funded by the National Natural Science Foundation of China under Grant (61975124), Shanghai Natural Science Foundation(20ZR1428600), the Open Project Program of Shanghai Key Laboratory of Data Science (NO.2020090600003), and the Open Project Funding from the State Key Lab of Computer Architecture, ICT, CAS under Grant CARCHA202111. Any opinions, findings and conclusions expressed in this paper are those of the authors and do not necessarily reflect the views of the sponsors.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pei, S., Ni, B., Shen, T. et al. RISAT: real-time instance segmentation with adversarial training. Multimed Tools Appl 82, 4063–4080 (2023). https://doi.org/10.1007/s11042-022-13447-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13447-1

Keywords

Navigation