Skip to main content
Log in

Attention-based adaptive context network for anchor-free instance segmentation

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

It is crucial to obtain accurate and efficient instance segmentation masks in many modern applications such as automatic pilot and robotic manipulation. In this paper, we propose a straightforward and flexible two-stage framework for instance segmentation, which simultaneously generates box-level localization information in an image and instance-level segmentation information for each instance. We name this framework as Attention-based Adaptive Context Network for anchor-free Instance Segmentation (ContextMask), which extends the object detector FCOS (Fully Convolutional One-stage Object Detection) by adding a novel multi-scale adaptive context-guided mask (MACG-Mask) branch containing an adaptive context network and a MaskIoU branch. The adaptive context network is to combine the global context in predicted bounding boxes and the MaskIoU branch is to evaluate the quality of the predicted masks. With the development of deep convolutional neural networks, the network continues to deepen so that it is difficult to balance spatial information and semantic information well. To address the issue, we design a weighted FPN, which obtains feature maps with balance-well spatial and semantic information by concatenating and weighting feature maps of different resolutions. Besides, we also propose an attention-based head, which adds spatial attention and channel attention module to make each pixel have a unique weight to solve the problem of large-scale variant of objects. We verify ContextMask’s effectiveness on the fine-annotations Cityscapes and COCO dataset. ContextMask outperforms state-of-the-art methods and achieves \(38.4\%\) AP on the Cityscapes dataset and 39.0\(\%\) AP on the COCO dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Liu S, Qi L, Qin HF, Shi JP (2018) Path aggregation network for instance segmentation. In: IEEE/CVF conference on computer vision and pattern recognition, pp 8759–8768

  2. Lee Y, Park J (2020) Centermask: real-time anchor-free instance segmentation. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 13903–13912

  3. Chen H, Sun KY, Tian Z, Shen CH, Yan YL (2020) Blendmask: top-down meets bottom-up for instance segmentation, pp 8570-8578

  4. Bolya D, Zhou C, Xiao FY, Lee Y (2020) Yolact++: better real-time instance segmentation. IEEE Trans Pattern Anal Mach Intell PP(99):1

    Google Scholar 

  5. He KM, Zhang XY, Ren SQ, Sun J (2016) Deep residual learning for image recognition, pp 770–778

  6. Zhou T, Li Z, Zhang C (2019) Enhance the recognition ability to occlusions and small objects with robust faster r-cnn. Int J Mach Learn Cybern 9:3155–3166

    Article  Google Scholar 

  7. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  8. Hu T, Yang M, Yang W, Li A (2018) An end-to-end differential network learning method for semantic segmentation. Int J Mach Learn Cybern 10(7):1–16

    Google Scholar 

  9. Zhang ZY, Fidler SJ, Urtasun R (2015) Instance-level segmentation for autonomous driving with deep densely connected mrfs. Computer Science, pp 669–677

  10. Fazeli N, Oller M, Wu J, Wu Z, Tenenbaum JB, Rodriguez A (2019) See, feel, act: hierarchical learning for complex manipulation skills with multisensory fusion. Sci Robot 4(26):eaav3123

    Article  Google Scholar 

  11. He KM, Gkioxari G, Piotr D, Girshick R (2017) Mask r-cnn. IEEE Trans Pattern Anal Mach Intell 42(2):386–397

    Article  Google Scholar 

  12. Ren SQ, He KM, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  13. Tian Z, Shen CH, Chen H, He T (2020) Fcos: Fully convolutional one-stage object detection. In: IEEE/CVF international conference on computer vision (ICCV), pp 9626–9635

  14. Wang X, Girshick X, Gupta A, He K (2018) Non-local neural networks. In: IEEE/CVF conference on computer vision and pattern recognition, pp 7794–7803

  15. Fu J, Liu J, Tian H (2020) Dual attention network for scene segmentation. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 3141–3149

  16. Chen ZF, Ding SF, Hou HW (2021) A novel self-attention deep subspace clustering. Int J Mach Learn Cybern, pp 1–11

  17. Bai M, Urtasun R (2017) Deep watershed transform for instance segmentation. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 2858–2866

  18. Kirillov A, Levinkov E, Andres B, Savchynskyy B, Rother C (2017) Instancecut: From edges to instances with multicut. In: IEEE conference on computer vision and pattern recognition, pp 7322–7331

  19. Huang ZJ, Huang LC, Gong YC, Huang C, Wang XG (2019) Mask scoring r-cnn, pp 6402–6411

  20. Hu J, Shen L, Sun G, Albanie S (2017) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42(8):2011–2023

    Article  Google Scholar 

  21. Woo S, Park JC, Lee JY, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19

  22. Li HC, Xiong PF, An J, Wang LX (2018) Pyramid attention network for semantic segmentation

  23. Zhang H, Zu KK, Lu J, Zou YR, Meng DY (2021) Epsanet: An efficient pyramid split attention block on convolutional neural network. arXiv preprint arXiv:2105.14447

  24. Liu W, Rabinovich A, Berg AC (2015) Parsenet: looking wider to see better. Computer ence, arXiv:1506.04579

  25. Zhao HS, Shi JP, Qi XJ, Wang XG, Jia JY (2016) Pyramid scene parsing network, pp 6230–6239

  26. Chen LC, Zhu YK, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818

  27. Wang X, Bao A, Cheng Y, Qiang Y (2019) Weight-sharing multi-stage multi-scale ensemble convolutional neural network. Int J Mach Learn Cybern 10:1631–1642

    Article  Google Scholar 

  28. Lin TY, Dollár P, Girshick R, He KM, Hariharan B, Belongie S (2016) Feature pyramid networks for object detection, pp 936–944

  29. Tan MX, Pang RM, Le QV (2020) Efficientdet: scalable and efficient object detection. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10778–10787

  30. Wang N, Gao Y, Chen H, Wang P, Zhang YN (2020) Nas-fcos: fast neural architecture search for object detection. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 11940–11948

  31. Qiao SY, Chen LC, Yuille A (2020) Detectors: detecting objects with recursive feature pyramid and switchable atrous convolution

  32. Cordts M, Omran M, Ramos S, Rehfeld T, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3213–3223

  33. He J, Deng Z, Zhou L, Wang Y, Qiao Y (2019) Adaptive pyramid context network for semantic segmentation. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7511–7520

  34. Lin TY, Goyal P, Girshick R, He KM, Dollár P (2017) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 99:2999–3007

    Google Scholar 

  35. Rezatofighi H, Tsoi N, Gwak JY, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 658–666

  36. Zhang RF, Tian Z, Shen CH, You MY, Yan YL (2020) Mask encoding for single shot instance segmentation, pp 10223–10232

  37. He K, Zhang X, Ren X, Sun X (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778

  38. Deng J, Dong W, Socher R, Li LJ, Kai L, Li FF (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255

  39. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, pp 740–755

  40. Pang JM, Chen K, Shi JP, Feng HJ, Ouyang WL, Lin DH (2019) Libra r-cnn: towards balanced learning for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  41. Park JC, Woo S, Lee JY, Kweon IS (2018) Bam: bottleneck attention module

  42. Homayounfar N, Xiong YW, Liang J, Ma WC, Urtasun R (2020) Levelset r-cnn: a deep variational method for instance segmentation. In: European conference on computer vision. Springer, pp 555–571

  43. Cheng TC, Wang XG, Huang LC, Liu WY (2020) Boundary-preserving mask r-cnn. arXiv e-prints,

  44. Cheng B, Collins MD, Zhu Y, Liu T, Huang TS, Adam H (2020) Panoptic-deeplab: a simple, strong, and fast baseline for bottom-up panoptic segmentation. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 12472–12482

  45. Xiong YW, Liao RJ, Zhao HS, Hu R, Urtasun R (2019) Upsnet: a unified panoptic segmentation network. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8810–8818

  46. Kang BR, Lee HK, Park KJ, Ryu H, Kim HY (2020) Bshapenet: object detection and instance segmentation with bounding shape masks. Pattern Recogn Lett 131:449–455

    Article  Google Scholar 

  47. Kirillov A, Girshick R, He KM, Dollár P (2019) Panoptic feature pyramid networks, pp 6392–6401

  48. Wang XL, Kong T, Shen CH (2020) Solo: segmenting objects by locations. In: Proceedings of the European conference on computer vision (ECCV), pp 649–665

  49. Tian Z, Shen CH, Chen H (2020) Conditional convolutions for instance segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 282–298

Download references

Acknowledgements

This work is supported by National Natural Science Foundation (NNSF) of China under Grant 62073237.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guoshan Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relation-ships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, T., Zhang, G., Yan, M. et al. Attention-based adaptive context network for anchor-free instance segmentation. Int. J. Mach. Learn. & Cyber. 14, 537–549 (2023). https://doi.org/10.1007/s13042-022-01648-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-022-01648-x

Keywords

Navigation