Multi-scale pooling learning for camouflaged instance segmentation

Li, Chen; Jiao, Ge; Yue, Guowen; He, Rong; Huang, Jiayu

doi:10.1007/s10489-024-05369-2

Multi-scale pooling learning for camouflaged instance segmentation

Published: 19 March 2024

Volume 54, pages 4062–4076, (2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

485 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Camouflaged instance segmentation (CIS) focuses on handling instances that attempt to blend into the background. However, existing CIS methods emphasize global interactions but overlook hidden clues at various scales, resulting in inaccurate recognition of camouflaged instances. To address this, we propose a multi-scale pooling network (MSPNet) to mine the hidden cues offered by the camouflaged instances at various scales. The network achieves an enhanced fusion of multi-scale information mainly through multilayer pooling. Specifically, the pyramid pooling transformer (P2T) is utilized as a robust backbone for extracting multi-scale features. Then, we introduce an end-to-end pooling learning transformer (PLT) to obtain instance-aware parameters and high-quality mask features. To further augment the fusion of various mask features, we design a novel multi-scale complementary feature pooling (MCFP) module. Additionally, we also suggest an instance normalization module with fused spatial attention (FSA-IN) to combine instance-aware parameters and mask features, resulting in the final camouflaged instances. Experimental results show the effectiveness of MSPNet, surpassing existing CIS models on the COD10K-Test and NC4K datasets, with respective average precision (AP) scores of 49.6% and 53.4%. This demonstrates the effectiveness of the proposed approach in detecting camouflaged instances. Our code will be published at https://github.com/another-u/MSPNet-main.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Revealing Hidden Context in Camouflage Instance Segmentation

Camouflage Object Segmentation with Multi-scale Feature Aggregation and Boundary Generation

OSFormer: One-Stage Camouflaged Instance Segmentation with Transformers

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

Data is available on request to the authors.

References

Fan D-P, Ji G-P, Cheng M-M, Shao L (2021) Concealed object detection. IEEE Trans Pattern Anal Mach Intell 44(10):6024–6042
Article Google Scholar
Pang Y, Zhao X, Xiang T-Z, Zhang L, Lu H (2022) Zoom in and out: a mixed-scale triplet network for camouflaged object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 2160–2170
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 3431–3440
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision. pp 2961–2969
Zhou Z, Siddiquee Md MR, Tajbakhsh N, Liang J (2018) Unet++: a nested U-Net architecture for medical image segmentation. In: Deep learning in medical image analysis and multimodal learning for clinical decision support: 4th international workshop, DLMIA 2018, and 8th international workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4. Springer, pp 3–11
Fan D-P, Ji G-P, Sun G, Cheng M-M, Shen J, Shao L (2020) Camouflaged object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 2777–2787
Mei H, Ji G-P, Wei Z, Yang X, Wei X, Fan D-P (2021) Camouflaged object segmentation with distraction mining. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 8772–8781
Yan J, Le T-N, Nguyen K-D, Tran M-T, Do T-T, Nguyen TV (2021) MirrorNet: bio-inspired camouflaged object segmentation. IEEE Access 9:43290–43300
Article Google Scholar
Pei J, Cheng T, Fan D-P, Tang H, Chen C, Van Gool L (2022) OSFormer: one-stage camouflaged instance segmentation with transformers. In: Computer vision–ECCV 2022: 17th European conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVIII. Springer, pp 19–37
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Article Google Scholar
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2881–2890
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 770–778
Wang W, Xie E, Li X, Fan D-P, Song K, Liang D, Tong L, Luo P, Shao L (2022) PVT v2: improved baselines with pyramid vision transformer. Comput Vis Media 8(3):415–424
Article Google Scholar
Wu Y-H, Liu Y, Zhan X, Cheng M-M (2022) P2T: pyramid pooling transformer for scene understanding. IEEE Trans Pattern Anal Mach Intell
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255
Lv Y, Zhang J, Dai Y, Li A, Liu B, Barnes N, Fan D-P (2021) Simultaneously localize, segment and rank the camouflaged objects. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 11591–11601
Bi H, Zhang C, Wang K, Tong J, Zheng F (2021) Rethinking camouflaged object detection: models and datasets. IEEE Trans Circuits Syst Video Technol 32(9):5708–5724
Article Google Scholar
Fan D-P, Ji G-P, Peng X, Cheng M-M, Sakaridis C, Van Gool L (2023) Advances in deep concealed scene understanding. Vis Intell 1(1):16
Article Google Scholar
Zhai W, Cao Y, Xie H, Zha Z-J (2022) Deep texton-coherence network for camouflaged object detection. IEEE Trans Multimed
Chen G, Liu S-J, Sun Y-J, Ji G-P, Ya-Feng W, Zhou T (2022) Camouflaged object detection via context-aware cross-level fusion. IEEE Trans Circuits Syst Video Technol 32(10):6981–6993
Article Google Scholar
Li S, Florencio D, Li W, Zhao Y, Cook C (2018) A fusion framework for camouflaged moving foreground detection in the wavelet domain. IEEE Trans Image Process 27(8):3918–3930
Article MathSciNet Google Scholar
He C, Li K, Zhang Y, Tang L, Zhang Y, Guo Z, Li X (2023) Camouflaged object detection with feature decomposition and edge reconstruction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 22046–22055
Lin J, Tan X, Xu K, Ma L, Lau RWH (2023) Frequency-aware camouflaged object detection. ACM Trans Multimed Comput Commun Appl 19(2):1–16
Article Google Scholar
Ren J, Hu X, Zhu L, Xu X, Xu Y, Wang W, Deng Z, Heng P-A (2021) Deep texture-aware features for camouflaged object detection. IEEE Trans Circuits Syst Video Technol
Zhai Q, Li X, Yang F, Chen C, Cheng H, Fan D-P (2021) Mutual graph learning for camouflaged object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 12997–13007
Yin B, Zhang X, Hou Q, Sun B-Y, Fan D-P, Van Gool L (2022) Camoformer: masked separable attention for camouflaged object detection. arXiv:2212.06570. https://doi.org/10.48550/arXiv.2212.06570
Le T-N, Cao Y, Nguyen T-C, Le M-Q, Nguyen K-D, Do T-T, Tran M-T, Nguyen TV (2021) Camouflaged instance segmentation in-the-wild: dataset, method, and benchmark suite. IEEE Trans Image Process 31:287–300
Article Google Scholar
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 8759–8768
Xie X, Cheng G, Wang J, Yao X, Han J (2021) Oriented R-CNN for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 3520–3529
Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. Advances in neural information processing systems, 29
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: single shot multibox detector. In: Computer vision–ECCV 2016: 14th European conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer, pp 21–37
Jiang H, Learned-Miller E (FG 2017) Face detection with the faster R-CNN. In: 2017 12th IEEE international conference on automatic face & gesture recognition. IEEE, pp 650–657
Chen L-C, Hermans A, Papandreou G, Schroff F, Wang P, Adam H (2018) MaskLab: instance segmentation by refining object detection with semantic and direction features. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4013–4022
Huang Z, Huang L, Gong Y, Huang C, Wang X (2019) Mask scoring R-CNN. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 6409–6418
Chen X, Girshick R, He K, Dollár P (2019) Tensormask: a foundation for dense object segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 2061–2069
Bolya D, Zhou C, Xiao F, Lee YJ (2019) YOLACT: real-time instance segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 9157–9166
Cheng B, Collins MD, Zhu Y, Liu T, Huang TS, Adam H, Chen L-C (2020) Panoptic-DeepLab: a simple, strong, and fast baseline for bottom-up panoptic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 12475–12485
Tian Z, Shen C, Chen H (2020) Conditional convolutions for instance segmentation. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16. Springer, pp 282–298
Wang X, Kong T, Shen C, Jiang Y, Li L (2020) SOLO: segmenting objects by locations. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16. Springer, pp 649–665
Wang X, Zhang R, Kong T, Li L, Shen C (2020) SOLOv2: dynamic and fast instance segmentation. Adv Neural Inf Process Syst 33:17721–17732
Google Scholar
Luo N, Pan Y, Sun R, Zhang T, Xiong Z, Wu F (2023) Camouflaged instance segmentation via explicit de-camouflaging. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 17918–17927
Nirthika R, Manivannan S, Ramanan A, Wang R (2022) Pooling in convolutional neural networks for medical image analysis: a survey and an empirical study. Neural Comput Appl 34(7):5321–5347
Article Google Scholar
Kamal Sarker Md M, Rashwan HA, Akram F, Banu SF, Saleh A, Singh VK, Chowdhury FUH, Abdulwahab S, Romani S, Radeva P et al (2018) SLSDeep: skin lesion segmentation based on dilated residual and pyramid pooling networks. In: Medical image computing and computer assisted intervention–MICCAI 2018: 21st International Conference, Granada, Spain, September 16-20, 2018, Proceedings, Part II 11. Springer, pp 21–29
Lian X, Pang Y, Han J, Pan J (2021) Cascaded hierarchical atrous spatial pyramid pooling module for semantic segmentation. Pattern Recognit 110:107622
Article Google Scholar
Huang Z, Wang J, Xuesong F, Tao Y, Guo Y, Wang R (2020) DC-SPP-YOLO: dense connection and spatial pyramid pooling based YOLO for object detection. Inf Sci 522:241–258
Article MathSciNet Google Scholar
Hou Q, Zhang L, Cheng M-M, Feng J (2020) Strip pooling: rethinking spatial pooling for scene parsing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 4003–4012
Mo Y, Yan W, Yang X, Liu F, Liao Y (2022) Review the state-of-the-art technologies of semantic segmentation based on deep learning. Neurocomputing 493:626–646
Article Google Scholar
Abbas Zaidi SS, MS Ansari, Aslam A, Kanwal N, Asghar M, Lee B (2022) A survey of modern deep learning based object detection models. Digit Signal Process 103514
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: Computer vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16. Springer, pp 213–229
Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2020) Deformable DETR: deformable transformers for end-to-end object detection. arXiv:2010.04159. https://doi.org/10.48550/arXiv.2010.04159
Wu K, Peng H, Chen M Fu, J, Chao H (2021) Rethinking and improving relative position encoding for vision transformer. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 10033–10041
Xie E, Wang W, Yu Z, Anandkumar A, Alvarez JM, Luo P (2021) SegFormer: simple and efficient design for semantic segmentation with transformers. Adv Neural Inf Process Syst 34:12077–12090
Google Scholar
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2117–2125
Xiang Q, Wang X, Lai J, Song Y, Li R, Lei L (2022) Multi-scale group-fusion convolutional neural network for high-resolution range profile target recognition. IET Radar Sonar Navig 16(12):1997–2016
Article Google Scholar
Ulyanov D, Vedaldi A, Lempitsky V (2016) Instance normalization: the missing ingredient for fast stylization. arXiv:1607.08022. https://doi.org/10.48550/arXiv.1607.08022
Guo R, Niu D, Qu L, Li Z (2021) SOTR: segmenting objects with transformers. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 7157–7166
Fang Y, Yang S, Wang X, Li Y, Fang C, Shan Y, Feng B, Liu W (2021) Instances as queries. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 6910–6919
Skurowski P, Abdulameer H, Błaszczyk J, Depta T, Kornacki A, Kozieł P (2018) Animal camouflage analysis: Chameleon database. Unpublished manuscript 2(6):7
Le T-N, Nguyen TV, Nie Z, Tran M-T, Sugimoto A (2019) Anabranch network for camouflaged object segmentation. Comput Vis Image Underst 184:45–56
Article Google Scholar
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 10012–10022

Download references

Acknowledgements

This work was supported by the Postgraduate Scientific Research Innovation Project of Hunan Province (CX20231264), the Hunan Provincial Natural Science Foundation of China (2021JJ50074, 2022JJ50016), the Science and Technology Plan Project of Hunan Province (2016TP1020) and the 14th Five-Year Plan Key Disciplines and Application-oriented Special Disciplines of Hunan Province (Xiangjiaotong [2022] 351).

Author information

Authors and Affiliations

College of Computer Science and Technology, Hengyang Normal University, Hengyang, 421002, China
Chen Li, Ge Jiao, Guowen Yue, Rong He & Jiayu Huang
Hunan Provincial Key Laboratory of Intelligent Information Processing and Application, Hengyang, 421002, China
Ge Jiao

Authors

Chen Li
View author publications
You can also search for this author inPubMed Google Scholar
Ge Jiao
View author publications
You can also search for this author inPubMed Google Scholar
Guowen Yue
View author publications
You can also search for this author inPubMed Google Scholar
Rong He
View author publications
You can also search for this author inPubMed Google Scholar
Jiayu Huang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Ge Jiao.

Ethics declarations

Conflict of interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, C., Jiao, G., Yue, G. et al. Multi-scale pooling learning for camouflaged instance segmentation. Appl Intell 54, 4062–4076 (2024). https://doi.org/10.1007/s10489-024-05369-2

Download citation

Accepted: 27 February 2024
Published: 19 March 2024
Issue Date: March 2024
DOI: https://doi.org/10.1007/s10489-024-05369-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-scale pooling learning for camouflaged instance segmentation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Revealing Hidden Context in Camouflage Instance Segmentation

Camouflage Object Segmentation with Multi-scale Feature Aggregation and Boundary Generation

OSFormer: One-Stage Camouflaged Instance Segmentation with Transformers

Explore related subjects

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now