Attention-based adaptive context network for anchor-free instance segmentation

Zhang, Tong; Zhang, Guoshan; Yan, Min; Zhang, Yueming

doi:10.1007/s13042-022-01648-x

Attention-based adaptive context network for anchor-free instance segmentation

Original Article
Published: 15 September 2022

Volume 14, pages 537–549, (2023)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Tong Zhang¹,
Guoshan Zhang ORCID: orcid.org/0000-0003-0994-5468¹,
Min Yan¹ &
…
Yueming Zhang¹

260 Accesses
1 Altmetric
Explore all metrics

Abstract

It is crucial to obtain accurate and efficient instance segmentation masks in many modern applications such as automatic pilot and robotic manipulation. In this paper, we propose a straightforward and flexible two-stage framework for instance segmentation, which simultaneously generates box-level localization information in an image and instance-level segmentation information for each instance. We name this framework as Attention-based Adaptive Context Network for anchor-free Instance Segmentation (ContextMask), which extends the object detector FCOS (Fully Convolutional One-stage Object Detection) by adding a novel multi-scale adaptive context-guided mask (MACG-Mask) branch containing an adaptive context network and a MaskIoU branch. The adaptive context network is to combine the global context in predicted bounding boxes and the MaskIoU branch is to evaluate the quality of the predicted masks. With the development of deep convolutional neural networks, the network continues to deepen so that it is difficult to balance spatial information and semantic information well. To address the issue, we design a weighted FPN, which obtains feature maps with balance-well spatial and semantic information by concatenating and weighting feature maps of different resolutions. Besides, we also propose an attention-based head, which adds spatial attention and channel attention module to make each pixel have a unique weight to solve the problem of large-scale variant of objects. We verify ContextMask’s effectiveness on the fine-annotations Cityscapes and COCO dataset. ContextMask outperforms state-of-the-art methods and achieves \(38.4\%\) AP on the Cityscapes dataset and 39.0\(\%\) AP on the COCO dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

Tausif Diwan, G. Anirudh & Jitendra V. Tembhurne

YOLO-based Object Detection Models: A Review and its Applications

Article 14 March 2024

Ajantha Vijayakumar & Subramaniyaswamy Vairavasundaram

References

Liu S, Qi L, Qin HF, Shi JP (2018) Path aggregation network for instance segmentation. In: IEEE/CVF conference on computer vision and pattern recognition, pp 8759–8768
Lee Y, Park J (2020) Centermask: real-time anchor-free instance segmentation. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 13903–13912
Chen H, Sun KY, Tian Z, Shen CH, Yan YL (2020) Blendmask: top-down meets bottom-up for instance segmentation, pp 8570-8578
Bolya D, Zhou C, Xiao FY, Lee Y (2020) Yolact++: better real-time instance segmentation. IEEE Trans Pattern Anal Mach Intell PP(99):1
Google Scholar
He KM, Zhang XY, Ren SQ, Sun J (2016) Deep residual learning for image recognition, pp 770–778
Zhou T, Li Z, Zhang C (2019) Enhance the recognition ability to occlusions and small objects with robust faster r-cnn. Int J Mach Learn Cybern 9:3155–3166
Article Google Scholar
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Article Google Scholar
Hu T, Yang M, Yang W, Li A (2018) An end-to-end differential network learning method for semantic segmentation. Int J Mach Learn Cybern 10(7):1–16
Google Scholar
Zhang ZY, Fidler SJ, Urtasun R (2015) Instance-level segmentation for autonomous driving with deep densely connected mrfs. Computer Science, pp 669–677
Fazeli N, Oller M, Wu J, Wu Z, Tenenbaum JB, Rodriguez A (2019) See, feel, act: hierarchical learning for complex manipulation skills with multisensory fusion. Sci Robot 4(26):eaav3123
Article Google Scholar
He KM, Gkioxari G, Piotr D, Girshick R (2017) Mask r-cnn. IEEE Trans Pattern Anal Mach Intell 42(2):386–397
Article Google Scholar
Ren SQ, He KM, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Article Google Scholar
Tian Z, Shen CH, Chen H, He T (2020) Fcos: Fully convolutional one-stage object detection. In: IEEE/CVF international conference on computer vision (ICCV), pp 9626–9635
Wang X, Girshick X, Gupta A, He K (2018) Non-local neural networks. In: IEEE/CVF conference on computer vision and pattern recognition, pp 7794–7803
Fu J, Liu J, Tian H (2020) Dual attention network for scene segmentation. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 3141–3149
Chen ZF, Ding SF, Hou HW (2021) A novel self-attention deep subspace clustering. Int J Mach Learn Cybern, pp 1–11
Bai M, Urtasun R (2017) Deep watershed transform for instance segmentation. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 2858–2866
Kirillov A, Levinkov E, Andres B, Savchynskyy B, Rother C (2017) Instancecut: From edges to instances with multicut. In: IEEE conference on computer vision and pattern recognition, pp 7322–7331
Huang ZJ, Huang LC, Gong YC, Huang C, Wang XG (2019) Mask scoring r-cnn, pp 6402–6411
Hu J, Shen L, Sun G, Albanie S (2017) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42(8):2011–2023
Article Google Scholar
Woo S, Park JC, Lee JY, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Li HC, Xiong PF, An J, Wang LX (2018) Pyramid attention network for semantic segmentation
Zhang H, Zu KK, Lu J, Zou YR, Meng DY (2021) Epsanet: An efficient pyramid split attention block on convolutional neural network. arXiv preprint arXiv:2105.14447
Liu W, Rabinovich A, Berg AC (2015) Parsenet: looking wider to see better. Computer ence, arXiv:1506.04579
Zhao HS, Shi JP, Qi XJ, Wang XG, Jia JY (2016) Pyramid scene parsing network, pp 6230–6239
Chen LC, Zhu YK, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818
Wang X, Bao A, Cheng Y, Qiang Y (2019) Weight-sharing multi-stage multi-scale ensemble convolutional neural network. Int J Mach Learn Cybern 10:1631–1642
Article Google Scholar
Lin TY, Dollár P, Girshick R, He KM, Hariharan B, Belongie S (2016) Feature pyramid networks for object detection, pp 936–944
Tan MX, Pang RM, Le QV (2020) Efficientdet: scalable and efficient object detection. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10778–10787
Wang N, Gao Y, Chen H, Wang P, Zhang YN (2020) Nas-fcos: fast neural architecture search for object detection. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 11940–11948
Qiao SY, Chen LC, Yuille A (2020) Detectors: detecting objects with recursive feature pyramid and switchable atrous convolution
Cordts M, Omran M, Ramos S, Rehfeld T, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3213–3223
He J, Deng Z, Zhou L, Wang Y, Qiao Y (2019) Adaptive pyramid context network for semantic segmentation. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7511–7520
Lin TY, Goyal P, Girshick R, He KM, Dollár P (2017) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 99:2999–3007
Google Scholar
Rezatofighi H, Tsoi N, Gwak JY, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 658–666
Zhang RF, Tian Z, Shen CH, You MY, Yan YL (2020) Mask encoding for single shot instance segmentation, pp 10223–10232
He K, Zhang X, Ren X, Sun X (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778
Deng J, Dong W, Socher R, Li LJ, Kai L, Li FF (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, pp 740–755
Pang JM, Chen K, Shi JP, Feng HJ, Ouyang WL, Lin DH (2019) Libra r-cnn: towards balanced learning for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Park JC, Woo S, Lee JY, Kweon IS (2018) Bam: bottleneck attention module
Homayounfar N, Xiong YW, Liang J, Ma WC, Urtasun R (2020) Levelset r-cnn: a deep variational method for instance segmentation. In: European conference on computer vision. Springer, pp 555–571
Cheng TC, Wang XG, Huang LC, Liu WY (2020) Boundary-preserving mask r-cnn. arXiv e-prints,
Cheng B, Collins MD, Zhu Y, Liu T, Huang TS, Adam H (2020) Panoptic-deeplab: a simple, strong, and fast baseline for bottom-up panoptic segmentation. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 12472–12482
Xiong YW, Liao RJ, Zhao HS, Hu R, Urtasun R (2019) Upsnet: a unified panoptic segmentation network. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8810–8818
Kang BR, Lee HK, Park KJ, Ryu H, Kim HY (2020) Bshapenet: object detection and instance segmentation with bounding shape masks. Pattern Recogn Lett 131:449–455
Article Google Scholar
Kirillov A, Girshick R, He KM, Dollár P (2019) Panoptic feature pyramid networks, pp 6392–6401
Wang XL, Kong T, Shen CH (2020) Solo: segmenting objects by locations. In: Proceedings of the European conference on computer vision (ECCV), pp 649–665
Tian Z, Shen CH, Chen H (2020) Conditional convolutions for instance segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 282–298

Download references

Acknowledgements

This work is supported by National Natural Science Foundation (NNSF) of China under Grant 62073237.

Author information

Authors and Affiliations

School of Electrical and Information Engineering, Tianjin University, Tianjin, 300072, China
Tong Zhang, Guoshan Zhang, Min Yan & Yueming Zhang

Authors

Tong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Guoshan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Min Yan
View author publications
You can also search for this author in PubMed Google Scholar
Yueming Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guoshan Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relation-ships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, T., Zhang, G., Yan, M. et al. Attention-based adaptive context network for anchor-free instance segmentation. Int. J. Mach. Learn. & Cyber. 14, 537–549 (2023). https://doi.org/10.1007/s13042-022-01648-x

Download citation

Received: 08 May 2021
Accepted: 31 August 2022
Published: 15 September 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s13042-022-01648-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attention-based adaptive context network for anchor-free instance segmentation

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

YOLO-based Object Detection Models: A Review and its Applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Attention-based adaptive context network for anchor-free instance segmentation

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

YOLO-based Object Detection Models: A Review and its Applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation