Skip to main content
Log in

SSPSNet: a single shot panoptic segmentation network for accurate scene parsing

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Panoptic segmentation is a challenging task which aims to provide a comprehensive scene parsing result. Researchers have been devoted to improve its accuracy and efficiency. In this paper, we propose a single shot panoptic segmentation network (SSPSNet) to handle this task more accurately. SSPSNet novelly develops the object detection network FCOS by adding a mask segmentation branch to predict the instance mask and a semantic segmentation branch to predict the classes of background pixels. In addition, we design a parameter-free identical mapping connection module that increases shortcut on the mask segmentation, FCOS classification and regression branches, respectively, to extract more expressive feature maps for instance segmentation and object detection subtasks. More importantly, we design a parameter-free category and location aware module that transfers the category and location information of FCOS to the mask and semantic segmentation branches for improving their ability of distinguishing instances and background. Experimental results show that the proposed SSPSNet gets 44.0 /45.8PQ, 11.6/10.0FPS on COCO-Panoptic 2017 when uses ResNet-50/101-FPN as backbone, which achieves the state-of-the-art performance with smaller parameters and computation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Kirillov A, He K, Girshick R et al (2019) Panoptic segmentation. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 9396–9405

  2. Zhao H, Jianping S, Xiaojuan Q et al (2017) Pyramid scene parsing network. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 6230–6239

  3. He K, Gkioxari G, Dollar P et al (2017) Mask r-cnn. In: IEEE International Conference on Computer Vision (ICCV), pp 2980–2988

  4. De Geus D, Meletis P, Dubbelman G (2019) Panoptic segmentation with a joint semantic and instance segmentation network. ArXiv Preprint, arXiv:1809.02110

  5. Li J, Raventos A, Bhargava A et al (2019) Learning to fuse things and stuff. ArXiv Preprint, arXiv:1812.01192

  6. Li Y, Chen X, Zhu Z et al (2019) Attention-guided unified network for panoptic segmentation. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 7019–7028

  7. Xiong Y, Liao R, Zhao H et al (2019) Upsnet: a unified panoptic segmentation network. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 8810–8818

  8. Lazarow J, Lee K, Shi K et al (2020) Learning instance occlusion for panoptic segmentation. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 10717–10726

  9. Kirillov A, Girshick R, He K et al (2019) Panoptic feature pyramid networks. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 6392–6401

  10. Yang TJ, Collins M, Zhu Y et al (2019) Deeperlab: single-shot image parser. ArXiv Preprint, arXiv:1902.05093

  11. Hou R, Jie L, Arjun B et al (2020) Real-time panoptic segmentation from dense detections. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 8520–8529

  12. Chen Q, Cheng A, He X et al (2020) Spatialflow: bridging all tasks for panoptic segmentation. In: IEEE Transactions on Circuits and Systems for Video Technology 31(6):2288–2300

  13. Lin TY, Goyal P, Girshick R et al (2017) Focal loss for dense object detection. In: IEEE International Conference on Computer Vision (ICCV), pp 2999–3007

  14. Redmon J, Divvala S, Girshick R et al (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 779–788

  15. Ren S, He K, Girshick R et al (2016) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  16. Liu W, Anguelov D, Erhan D et al (2016) SSD: single shot multibox detector. In: Proceedings of the European conference on computer vision (ECCV), pp 21–37

  17. Tian Z, Shen C, Chen H et al (2019) Fcos: fully convolutional one-stage object detection. In: IEEE International Conference on Computer Vision (ICCV), pp 9626–9635

  18. Duan K, Bai S, Xie L et al (2019) Centernet: keypoint triplets for object detection. In: IEEE International Conference on Computer Vision (ICCV), pp 6568–6577

  19. Yang Z, Liu S, Hu H et al (2019) Reppoints: point set representation for object detection. In: IEEE International Conference on Computer Vision (ICCV), pp 9657–9666

  20. Law H, Deng J (2019) Cornernet: detecting objects as paired keypoints. ArXiv Preprint, arXiv:1808.01244

  21. Zhang S, Chi C, Yao Y et al (2020) Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 9756–9765

  22. Liu S, Jia J, Fidler S et al (2017) SGN: sequential grouping networks for instance segmentation. In: IEEE International Conference on Computer Vision (ICCV), pp 3516–3524

  23. Newell A, Huang Z, Deng J (2017) Associative embedding: end-to-end learning for joint detection and grouping. In: Advances in Neural Information Processing Systems, pp 2277–2287

  24. Chen H, Sun K, Tian Z et al (2020) Blendmask: top-down meets bottom-up for instance segmentation. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 8570–8578

  25. Xie E, Peize S, Xiaoge S et al (2020) Polarmask: single shot instance segmentation with polar representation. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 12190–12199

  26. Wang X, Kong T, Shen C et al (2020) Solo: segmenting objects by locations. In: Proceedings of the European conference on computer vision (ECCV), pp 649–665

  27. Wang X, Zhang R, Kong T et al (2020) SOLOv2: dynamic and fast instance segmentation. In: Advances in Neural Information Processing Systems, pp 17721–17732

  28. Uijlings JR, van de Sande KE, Gevers T et al (2013) Selective search for object recognition. Int J Comput Vision 104(2):154–171

    Article  Google Scholar 

  29. Carreira J, Sminchisescu C (2012) CPMC: automatic object segmentation using constrained parametric min-cuts. IEEE Transa Pattern Anal Mach Intell 34(7):1312–1328

    Article  Google Scholar 

  30. Lafferty J, McCallum A, Pereira FC (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning, pp 282–289

  31. Boykov YY, Jolly M (2001) Interactive graph cuts for optimal boundary region segmentation of objects in n-d images. In: IEEE International Conference on Computer Vision (ICCV), pp 105–112

  32. Long J, Shelhamer S, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 3431–3440

  33. Yu F, Vladlen K, Thomas F (2017) Dilated residual networks. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 636–644

  34. Chen LC, Papandreou G, Schroff F et al (2017) Rethinking atrous convolution for semantic image segmentation. ArXiv Preprint, arXiv:1706.05587

  35. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  36. Dai J, Haozhi Q, Yuwen X et al. (2017) Deformable convolutional networks. In: IEEE International Conference on Computer Vision (ICCV), pp 764–773

  37. Zhu H, Zhang M, Zhang X et al (2021) Two-branch encoding and iterative attention decoding network for semantic segmentation. Neural Comput Appl 33:5151–5166

    Google Scholar 

  38. Li Q, Arnab A, Torr PH (2018) Weakly-and semi-supervised panoptic segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 102–118

  39. Chen Y, Lin G, Li S et al (2020) Banet: bidirectional aggregation network with occlusion handling for panoptic segmentation. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 3792–3801

  40. Li Q, Qi X, Torr PH (2020) Unifying training and inference for panoptic segmentation. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 13317–13325

  41. He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 770–778

  42. Lin TY, Dollar P, Girshick R et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 936–944

  43. Rezatofighi H, Nathan T, JunYoung G et al (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 658–666

  44. Yi-de M, Qing, L, Zhi-Bai Q (2004) Automated image segmentation using improved PCNN model based on cross-entropy. In: International Symposium on Intelligent Multimedia, Video and Speech Processing, pp 743–746

  45. Lin TY, Michael M, Serge B et al (2014) Microsoft coco: common objects in context. In: Proceedings of the European conference on computer vision (ECCV), pp 740–755

  46. Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: adaptive instance selection network. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 7354–7362

  47. Liu H, Chao P, Changqian Y et al (2019) An end-to-end network for panoptic segmentation. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 6165–6174

  48. Wu Y, Zhang G, Gao Y et al (2020) Bidirectional graph reasoning network for panoptic segmentation. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 9077–9086

  49. Hwang S, Oh SW, Kim SJ (2020) Single-shot path integrated panoptic segmentation. ArXiv Preprint, arXiv:2012.01632

  50. Gao N, Shan Y, Wang Y et al (2020) SSAP: single-shot instance segmentation with affinity pyramid. In: IEEE International Conference on Computer Vision, pp 642–651

  51. Chen K, Wang J, Pang J et al (2019) MMDetection: open mmlab detection toolbox and benchmark. In: CORR. ArXiv Preprint, arxiv:1906.07155

  52. Paszke A, Sam G, Soumith C et al (2017) Automatic differentiation in pytorch. In: Advances in Neural Information Processing Systems Workshop

Download references

Funding

This work is supported by National Natural Science Foundation of China (Grant No.61703088), the Fundamental Research Funds for the Central Universities (Grant No.N2105009) and the Doctoral Scientific Research Foundation of Liaoning Province (Grant No.20170520326).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiangde Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Q., Wang, Y., Zhou, Y. et al. SSPSNet: a single shot panoptic segmentation network for accurate scene parsing. Neural Comput & Applic 34, 677–688 (2022). https://doi.org/10.1007/s00521-021-06350-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-06350-7

Keywords

Navigation