Abstract
It is crucial to obtain accurate and efficient instance segmentation masks in many modern applications such as automatic pilot and robotic manipulation. In this paper, we propose a straightforward and flexible two-stage framework for instance segmentation, which simultaneously generates box-level localization information in an image and instance-level segmentation information for each instance. We name this framework as Attention-based Adaptive Context Network for anchor-free Instance Segmentation (ContextMask), which extends the object detector FCOS (Fully Convolutional One-stage Object Detection) by adding a novel multi-scale adaptive context-guided mask (MACG-Mask) branch containing an adaptive context network and a MaskIoU branch. The adaptive context network is to combine the global context in predicted bounding boxes and the MaskIoU branch is to evaluate the quality of the predicted masks. With the development of deep convolutional neural networks, the network continues to deepen so that it is difficult to balance spatial information and semantic information well. To address the issue, we design a weighted FPN, which obtains feature maps with balance-well spatial and semantic information by concatenating and weighting feature maps of different resolutions. Besides, we also propose an attention-based head, which adds spatial attention and channel attention module to make each pixel have a unique weight to solve the problem of large-scale variant of objects. We verify ContextMask’s effectiveness on the fine-annotations Cityscapes and COCO dataset. ContextMask outperforms state-of-the-art methods and achieves \(38.4\%\) AP on the Cityscapes dataset and 39.0\(\%\) AP on the COCO dataset.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Liu S, Qi L, Qin HF, Shi JP (2018) Path aggregation network for instance segmentation. In: IEEE/CVF conference on computer vision and pattern recognition, pp 8759–8768
Lee Y, Park J (2020) Centermask: real-time anchor-free instance segmentation. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 13903–13912
Chen H, Sun KY, Tian Z, Shen CH, Yan YL (2020) Blendmask: top-down meets bottom-up for instance segmentation, pp 8570-8578
Bolya D, Zhou C, Xiao FY, Lee Y (2020) Yolact++: better real-time instance segmentation. IEEE Trans Pattern Anal Mach Intell PP(99):1
He KM, Zhang XY, Ren SQ, Sun J (2016) Deep residual learning for image recognition, pp 770–778
Zhou T, Li Z, Zhang C (2019) Enhance the recognition ability to occlusions and small objects with robust faster r-cnn. Int J Mach Learn Cybern 9:3155–3166
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Hu T, Yang M, Yang W, Li A (2018) An end-to-end differential network learning method for semantic segmentation. Int J Mach Learn Cybern 10(7):1–16
Zhang ZY, Fidler SJ, Urtasun R (2015) Instance-level segmentation for autonomous driving with deep densely connected mrfs. Computer Science, pp 669–677
Fazeli N, Oller M, Wu J, Wu Z, Tenenbaum JB, Rodriguez A (2019) See, feel, act: hierarchical learning for complex manipulation skills with multisensory fusion. Sci Robot 4(26):eaav3123
He KM, Gkioxari G, Piotr D, Girshick R (2017) Mask r-cnn. IEEE Trans Pattern Anal Mach Intell 42(2):386–397
Ren SQ, He KM, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Tian Z, Shen CH, Chen H, He T (2020) Fcos: Fully convolutional one-stage object detection. In: IEEE/CVF international conference on computer vision (ICCV), pp 9626–9635
Wang X, Girshick X, Gupta A, He K (2018) Non-local neural networks. In: IEEE/CVF conference on computer vision and pattern recognition, pp 7794–7803
Fu J, Liu J, Tian H (2020) Dual attention network for scene segmentation. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 3141–3149
Chen ZF, Ding SF, Hou HW (2021) A novel self-attention deep subspace clustering. Int J Mach Learn Cybern, pp 1–11
Bai M, Urtasun R (2017) Deep watershed transform for instance segmentation. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 2858–2866
Kirillov A, Levinkov E, Andres B, Savchynskyy B, Rother C (2017) Instancecut: From edges to instances with multicut. In: IEEE conference on computer vision and pattern recognition, pp 7322–7331
Huang ZJ, Huang LC, Gong YC, Huang C, Wang XG (2019) Mask scoring r-cnn, pp 6402–6411
Hu J, Shen L, Sun G, Albanie S (2017) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42(8):2011–2023
Woo S, Park JC, Lee JY, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Li HC, Xiong PF, An J, Wang LX (2018) Pyramid attention network for semantic segmentation
Zhang H, Zu KK, Lu J, Zou YR, Meng DY (2021) Epsanet: An efficient pyramid split attention block on convolutional neural network. arXiv preprint arXiv:2105.14447
Liu W, Rabinovich A, Berg AC (2015) Parsenet: looking wider to see better. Computer ence, arXiv:1506.04579
Zhao HS, Shi JP, Qi XJ, Wang XG, Jia JY (2016) Pyramid scene parsing network, pp 6230–6239
Chen LC, Zhu YK, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818
Wang X, Bao A, Cheng Y, Qiang Y (2019) Weight-sharing multi-stage multi-scale ensemble convolutional neural network. Int J Mach Learn Cybern 10:1631–1642
Lin TY, Dollár P, Girshick R, He KM, Hariharan B, Belongie S (2016) Feature pyramid networks for object detection, pp 936–944
Tan MX, Pang RM, Le QV (2020) Efficientdet: scalable and efficient object detection. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10778–10787
Wang N, Gao Y, Chen H, Wang P, Zhang YN (2020) Nas-fcos: fast neural architecture search for object detection. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 11940–11948
Qiao SY, Chen LC, Yuille A (2020) Detectors: detecting objects with recursive feature pyramid and switchable atrous convolution
Cordts M, Omran M, Ramos S, Rehfeld T, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3213–3223
He J, Deng Z, Zhou L, Wang Y, Qiao Y (2019) Adaptive pyramid context network for semantic segmentation. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7511–7520
Lin TY, Goyal P, Girshick R, He KM, Dollár P (2017) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 99:2999–3007
Rezatofighi H, Tsoi N, Gwak JY, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 658–666
Zhang RF, Tian Z, Shen CH, You MY, Yan YL (2020) Mask encoding for single shot instance segmentation, pp 10223–10232
He K, Zhang X, Ren X, Sun X (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778
Deng J, Dong W, Socher R, Li LJ, Kai L, Li FF (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, pp 740–755
Pang JM, Chen K, Shi JP, Feng HJ, Ouyang WL, Lin DH (2019) Libra r-cnn: towards balanced learning for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Park JC, Woo S, Lee JY, Kweon IS (2018) Bam: bottleneck attention module
Homayounfar N, Xiong YW, Liang J, Ma WC, Urtasun R (2020) Levelset r-cnn: a deep variational method for instance segmentation. In: European conference on computer vision. Springer, pp 555–571
Cheng TC, Wang XG, Huang LC, Liu WY (2020) Boundary-preserving mask r-cnn. arXiv e-prints,
Cheng B, Collins MD, Zhu Y, Liu T, Huang TS, Adam H (2020) Panoptic-deeplab: a simple, strong, and fast baseline for bottom-up panoptic segmentation. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 12472–12482
Xiong YW, Liao RJ, Zhao HS, Hu R, Urtasun R (2019) Upsnet: a unified panoptic segmentation network. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8810–8818
Kang BR, Lee HK, Park KJ, Ryu H, Kim HY (2020) Bshapenet: object detection and instance segmentation with bounding shape masks. Pattern Recogn Lett 131:449–455
Kirillov A, Girshick R, He KM, Dollár P (2019) Panoptic feature pyramid networks, pp 6392–6401
Wang XL, Kong T, Shen CH (2020) Solo: segmenting objects by locations. In: Proceedings of the European conference on computer vision (ECCV), pp 649–665
Tian Z, Shen CH, Chen H (2020) Conditional convolutions for instance segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 282–298
Acknowledgements
This work is supported by National Natural Science Foundation (NNSF) of China under Grant 62073237.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relation-ships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, T., Zhang, G., Yan, M. et al. Attention-based adaptive context network for anchor-free instance segmentation. Int. J. Mach. Learn. & Cyber. 14, 537–549 (2023). https://doi.org/10.1007/s13042-022-01648-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-022-01648-x