Abstract
Panoptic segmentation is a challenging task which aims to provide a comprehensive scene parsing result. Researchers have been devoted to improve its accuracy and efficiency. In this paper, we propose a single shot panoptic segmentation network (SSPSNet) to handle this task more accurately. SSPSNet novelly develops the object detection network FCOS by adding a mask segmentation branch to predict the instance mask and a semantic segmentation branch to predict the classes of background pixels. In addition, we design a parameter-free identical mapping connection module that increases shortcut on the mask segmentation, FCOS classification and regression branches, respectively, to extract more expressive feature maps for instance segmentation and object detection subtasks. More importantly, we design a parameter-free category and location aware module that transfers the category and location information of FCOS to the mask and semantic segmentation branches for improving their ability of distinguishing instances and background. Experimental results show that the proposed SSPSNet gets 44.0 /45.8PQ, 11.6/10.0FPS on COCO-Panoptic 2017 when uses ResNet-50/101-FPN as backbone, which achieves the state-of-the-art performance with smaller parameters and computation.
Similar content being viewed by others
References
Kirillov A, He K, Girshick R et al (2019) Panoptic segmentation. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 9396–9405
Zhao H, Jianping S, Xiaojuan Q et al (2017) Pyramid scene parsing network. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 6230–6239
He K, Gkioxari G, Dollar P et al (2017) Mask r-cnn. In: IEEE International Conference on Computer Vision (ICCV), pp 2980–2988
De Geus D, Meletis P, Dubbelman G (2019) Panoptic segmentation with a joint semantic and instance segmentation network. ArXiv Preprint, arXiv:1809.02110
Li J, Raventos A, Bhargava A et al (2019) Learning to fuse things and stuff. ArXiv Preprint, arXiv:1812.01192
Li Y, Chen X, Zhu Z et al (2019) Attention-guided unified network for panoptic segmentation. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 7019–7028
Xiong Y, Liao R, Zhao H et al (2019) Upsnet: a unified panoptic segmentation network. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 8810–8818
Lazarow J, Lee K, Shi K et al (2020) Learning instance occlusion for panoptic segmentation. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 10717–10726
Kirillov A, Girshick R, He K et al (2019) Panoptic feature pyramid networks. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 6392–6401
Yang TJ, Collins M, Zhu Y et al (2019) Deeperlab: single-shot image parser. ArXiv Preprint, arXiv:1902.05093
Hou R, Jie L, Arjun B et al (2020) Real-time panoptic segmentation from dense detections. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 8520–8529
Chen Q, Cheng A, He X et al (2020) Spatialflow: bridging all tasks for panoptic segmentation. In: IEEE Transactions on Circuits and Systems for Video Technology 31(6):2288–2300
Lin TY, Goyal P, Girshick R et al (2017) Focal loss for dense object detection. In: IEEE International Conference on Computer Vision (ICCV), pp 2999–3007
Redmon J, Divvala S, Girshick R et al (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 779–788
Ren S, He K, Girshick R et al (2016) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Liu W, Anguelov D, Erhan D et al (2016) SSD: single shot multibox detector. In: Proceedings of the European conference on computer vision (ECCV), pp 21–37
Tian Z, Shen C, Chen H et al (2019) Fcos: fully convolutional one-stage object detection. In: IEEE International Conference on Computer Vision (ICCV), pp 9626–9635
Duan K, Bai S, Xie L et al (2019) Centernet: keypoint triplets for object detection. In: IEEE International Conference on Computer Vision (ICCV), pp 6568–6577
Yang Z, Liu S, Hu H et al (2019) Reppoints: point set representation for object detection. In: IEEE International Conference on Computer Vision (ICCV), pp 9657–9666
Law H, Deng J (2019) Cornernet: detecting objects as paired keypoints. ArXiv Preprint, arXiv:1808.01244
Zhang S, Chi C, Yao Y et al (2020) Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 9756–9765
Liu S, Jia J, Fidler S et al (2017) SGN: sequential grouping networks for instance segmentation. In: IEEE International Conference on Computer Vision (ICCV), pp 3516–3524
Newell A, Huang Z, Deng J (2017) Associative embedding: end-to-end learning for joint detection and grouping. In: Advances in Neural Information Processing Systems, pp 2277–2287
Chen H, Sun K, Tian Z et al (2020) Blendmask: top-down meets bottom-up for instance segmentation. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 8570–8578
Xie E, Peize S, Xiaoge S et al (2020) Polarmask: single shot instance segmentation with polar representation. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 12190–12199
Wang X, Kong T, Shen C et al (2020) Solo: segmenting objects by locations. In: Proceedings of the European conference on computer vision (ECCV), pp 649–665
Wang X, Zhang R, Kong T et al (2020) SOLOv2: dynamic and fast instance segmentation. In: Advances in Neural Information Processing Systems, pp 17721–17732
Uijlings JR, van de Sande KE, Gevers T et al (2013) Selective search for object recognition. Int J Comput Vision 104(2):154–171
Carreira J, Sminchisescu C (2012) CPMC: automatic object segmentation using constrained parametric min-cuts. IEEE Transa Pattern Anal Mach Intell 34(7):1312–1328
Lafferty J, McCallum A, Pereira FC (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning, pp 282–289
Boykov YY, Jolly M (2001) Interactive graph cuts for optimal boundary region segmentation of objects in n-d images. In: IEEE International Conference on Computer Vision (ICCV), pp 105–112
Long J, Shelhamer S, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 3431–3440
Yu F, Vladlen K, Thomas F (2017) Dilated residual networks. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 636–644
Chen LC, Papandreou G, Schroff F et al (2017) Rethinking atrous convolution for semantic image segmentation. ArXiv Preprint, arXiv:1706.05587
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Dai J, Haozhi Q, Yuwen X et al. (2017) Deformable convolutional networks. In: IEEE International Conference on Computer Vision (ICCV), pp 764–773
Zhu H, Zhang M, Zhang X et al (2021) Two-branch encoding and iterative attention decoding network for semantic segmentation. Neural Comput Appl 33:5151–5166
Li Q, Arnab A, Torr PH (2018) Weakly-and semi-supervised panoptic segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 102–118
Chen Y, Lin G, Li S et al (2020) Banet: bidirectional aggregation network with occlusion handling for panoptic segmentation. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 3792–3801
Li Q, Qi X, Torr PH (2020) Unifying training and inference for panoptic segmentation. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 13317–13325
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 770–778
Lin TY, Dollar P, Girshick R et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 936–944
Rezatofighi H, Nathan T, JunYoung G et al (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 658–666
Yi-de M, Qing, L, Zhi-Bai Q (2004) Automated image segmentation using improved PCNN model based on cross-entropy. In: International Symposium on Intelligent Multimedia, Video and Speech Processing, pp 743–746
Lin TY, Michael M, Serge B et al (2014) Microsoft coco: common objects in context. In: Proceedings of the European conference on computer vision (ECCV), pp 740–755
Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: adaptive instance selection network. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 7354–7362
Liu H, Chao P, Changqian Y et al (2019) An end-to-end network for panoptic segmentation. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 6165–6174
Wu Y, Zhang G, Gao Y et al (2020) Bidirectional graph reasoning network for panoptic segmentation. In: Proceedings of the IEEE on computer vision and pattern recognition (CVPR), pp 9077–9086
Hwang S, Oh SW, Kim SJ (2020) Single-shot path integrated panoptic segmentation. ArXiv Preprint, arXiv:2012.01632
Gao N, Shan Y, Wang Y et al (2020) SSAP: single-shot instance segmentation with affinity pyramid. In: IEEE International Conference on Computer Vision, pp 642–651
Chen K, Wang J, Pang J et al (2019) MMDetection: open mmlab detection toolbox and benchmark. In: CORR. ArXiv Preprint, arxiv:1906.07155
Paszke A, Sam G, Soumith C et al (2017) Automatic differentiation in pytorch. In: Advances in Neural Information Processing Systems Workshop
Funding
This work is supported by National Natural Science Foundation of China (Grant No.61703088), the Fundamental Research Funds for the Central Universities (Grant No.N2105009) and the Doctoral Scientific Research Foundation of Liaoning Province (Grant No.20170520326).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Wang, Q., Wang, Y., Zhou, Y. et al. SSPSNet: a single shot panoptic segmentation network for accurate scene parsing. Neural Comput & Applic 34, 677–688 (2022). https://doi.org/10.1007/s00521-021-06350-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06350-7