Abstract
Camouflaged instance segmentation (CIS) focuses on handling instances that attempt to blend into the background. However, existing CIS methods emphasize global interactions but overlook hidden clues at various scales, resulting in inaccurate recognition of camouflaged instances. To address this, we propose a multi-scale pooling network (MSPNet) to mine the hidden cues offered by the camouflaged instances at various scales. The network achieves an enhanced fusion of multi-scale information mainly through multilayer pooling. Specifically, the pyramid pooling transformer (P2T) is utilized as a robust backbone for extracting multi-scale features. Then, we introduce an end-to-end pooling learning transformer (PLT) to obtain instance-aware parameters and high-quality mask features. To further augment the fusion of various mask features, we design a novel multi-scale complementary feature pooling (MCFP) module. Additionally, we also suggest an instance normalization module with fused spatial attention (FSA-IN) to combine instance-aware parameters and mask features, resulting in the final camouflaged instances. Experimental results show the effectiveness of MSPNet, surpassing existing CIS models on the COD10K-Test and NC4K datasets, with respective average precision (AP) scores of 49.6% and 53.4%. This demonstrates the effectiveness of the proposed approach in detecting camouflaged instances. Our code will be published at https://github.com/another-u/MSPNet-main.











Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
Data is available on request to the authors.
References
Fan D-P, Ji G-P, Cheng M-M, Shao L (2021) Concealed object detection. IEEE Trans Pattern Anal Mach Intell 44(10):6024–6042
Pang Y, Zhao X, Xiang T-Z, Zhang L, Lu H (2022) Zoom in and out: a mixed-scale triplet network for camouflaged object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 2160–2170
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 3431–3440
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision. pp 2961–2969
Zhou Z, Siddiquee Md MR, Tajbakhsh N, Liang J (2018) Unet++: a nested U-Net architecture for medical image segmentation. In: Deep learning in medical image analysis and multimodal learning for clinical decision support: 4th international workshop, DLMIA 2018, and 8th international workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4. Springer, pp 3–11
Fan D-P, Ji G-P, Sun G, Cheng M-M, Shen J, Shao L (2020) Camouflaged object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 2777–2787
Mei H, Ji G-P, Wei Z, Yang X, Wei X, Fan D-P (2021) Camouflaged object segmentation with distraction mining. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 8772–8781
Yan J, Le T-N, Nguyen K-D, Tran M-T, Do T-T, Nguyen TV (2021) MirrorNet: bio-inspired camouflaged object segmentation. IEEE Access 9:43290–43300
Pei J, Cheng T, Fan D-P, Tang H, Chen C, Van Gool L (2022) OSFormer: one-stage camouflaged instance segmentation with transformers. In: Computer vision–ECCV 2022: 17th European conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVIII. Springer, pp 19–37
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2881–2890
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 770–778
Wang W, Xie E, Li X, Fan D-P, Song K, Liang D, Tong L, Luo P, Shao L (2022) PVT v2: improved baselines with pyramid vision transformer. Comput Vis Media 8(3):415–424
Wu Y-H, Liu Y, Zhan X, Cheng M-M (2022) P2T: pyramid pooling transformer for scene understanding. IEEE Trans Pattern Anal Mach Intell
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255
Lv Y, Zhang J, Dai Y, Li A, Liu B, Barnes N, Fan D-P (2021) Simultaneously localize, segment and rank the camouflaged objects. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 11591–11601
Bi H, Zhang C, Wang K, Tong J, Zheng F (2021) Rethinking camouflaged object detection: models and datasets. IEEE Trans Circuits Syst Video Technol 32(9):5708–5724
Fan D-P, Ji G-P, Peng X, Cheng M-M, Sakaridis C, Van Gool L (2023) Advances in deep concealed scene understanding. Vis Intell 1(1):16
Zhai W, Cao Y, Xie H, Zha Z-J (2022) Deep texton-coherence network for camouflaged object detection. IEEE Trans Multimed
Chen G, Liu S-J, Sun Y-J, Ji G-P, Ya-Feng W, Zhou T (2022) Camouflaged object detection via context-aware cross-level fusion. IEEE Trans Circuits Syst Video Technol 32(10):6981–6993
Li S, Florencio D, Li W, Zhao Y, Cook C (2018) A fusion framework for camouflaged moving foreground detection in the wavelet domain. IEEE Trans Image Process 27(8):3918–3930
He C, Li K, Zhang Y, Tang L, Zhang Y, Guo Z, Li X (2023) Camouflaged object detection with feature decomposition and edge reconstruction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 22046–22055
Lin J, Tan X, Xu K, Ma L, Lau RWH (2023) Frequency-aware camouflaged object detection. ACM Trans Multimed Comput Commun Appl 19(2):1–16
Ren J, Hu X, Zhu L, Xu X, Xu Y, Wang W, Deng Z, Heng P-A (2021) Deep texture-aware features for camouflaged object detection. IEEE Trans Circuits Syst Video Technol
Zhai Q, Li X, Yang F, Chen C, Cheng H, Fan D-P (2021) Mutual graph learning for camouflaged object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 12997–13007
Yin B, Zhang X, Hou Q, Sun B-Y, Fan D-P, Van Gool L (2022) Camoformer: masked separable attention for camouflaged object detection. arXiv:2212.06570. https://doi.org/10.48550/arXiv.2212.06570
Le T-N, Cao Y, Nguyen T-C, Le M-Q, Nguyen K-D, Do T-T, Tran M-T, Nguyen TV (2021) Camouflaged instance segmentation in-the-wild: dataset, method, and benchmark suite. IEEE Trans Image Process 31:287–300
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 8759–8768
Xie X, Cheng G, Wang J, Yao X, Han J (2021) Oriented R-CNN for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 3520–3529
Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. Advances in neural information processing systems, 29
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: single shot multibox detector. In: Computer vision–ECCV 2016: 14th European conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer, pp 21–37
Jiang H, Learned-Miller E (FG 2017) Face detection with the faster R-CNN. In: 2017 12th IEEE international conference on automatic face & gesture recognition. IEEE, pp 650–657
Chen L-C, Hermans A, Papandreou G, Schroff F, Wang P, Adam H (2018) MaskLab: instance segmentation by refining object detection with semantic and direction features. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4013–4022
Huang Z, Huang L, Gong Y, Huang C, Wang X (2019) Mask scoring R-CNN. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 6409–6418
Chen X, Girshick R, He K, Dollár P (2019) Tensormask: a foundation for dense object segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 2061–2069
Bolya D, Zhou C, Xiao F, Lee YJ (2019) YOLACT: real-time instance segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 9157–9166
Cheng B, Collins MD, Zhu Y, Liu T, Huang TS, Adam H, Chen L-C (2020) Panoptic-DeepLab: a simple, strong, and fast baseline for bottom-up panoptic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 12475–12485
Tian Z, Shen C, Chen H (2020) Conditional convolutions for instance segmentation. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16. Springer, pp 282–298
Wang X, Kong T, Shen C, Jiang Y, Li L (2020) SOLO: segmenting objects by locations. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16. Springer, pp 649–665
Wang X, Zhang R, Kong T, Li L, Shen C (2020) SOLOv2: dynamic and fast instance segmentation. Adv Neural Inf Process Syst 33:17721–17732
Luo N, Pan Y, Sun R, Zhang T, Xiong Z, Wu F (2023) Camouflaged instance segmentation via explicit de-camouflaging. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 17918–17927
Nirthika R, Manivannan S, Ramanan A, Wang R (2022) Pooling in convolutional neural networks for medical image analysis: a survey and an empirical study. Neural Comput Appl 34(7):5321–5347
Kamal Sarker Md M, Rashwan HA, Akram F, Banu SF, Saleh A, Singh VK, Chowdhury FUH, Abdulwahab S, Romani S, Radeva P et al (2018) SLSDeep: skin lesion segmentation based on dilated residual and pyramid pooling networks. In: Medical image computing and computer assisted intervention–MICCAI 2018: 21st International Conference, Granada, Spain, September 16-20, 2018, Proceedings, Part II 11. Springer, pp 21–29
Lian X, Pang Y, Han J, Pan J (2021) Cascaded hierarchical atrous spatial pyramid pooling module for semantic segmentation. Pattern Recognit 110:107622
Huang Z, Wang J, Xuesong F, Tao Y, Guo Y, Wang R (2020) DC-SPP-YOLO: dense connection and spatial pyramid pooling based YOLO for object detection. Inf Sci 522:241–258
Hou Q, Zhang L, Cheng M-M, Feng J (2020) Strip pooling: rethinking spatial pooling for scene parsing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 4003–4012
Mo Y, Yan W, Yang X, Liu F, Liao Y (2022) Review the state-of-the-art technologies of semantic segmentation based on deep learning. Neurocomputing 493:626–646
Abbas Zaidi SS, MS Ansari, Aslam A, Kanwal N, Asghar M, Lee B (2022) A survey of modern deep learning based object detection models. Digit Signal Process 103514
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: Computer vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16. Springer, pp 213–229
Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2020) Deformable DETR: deformable transformers for end-to-end object detection. arXiv:2010.04159. https://doi.org/10.48550/arXiv.2010.04159
Wu K, Peng H, Chen M Fu, J, Chao H (2021) Rethinking and improving relative position encoding for vision transformer. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 10033–10041
Xie E, Wang W, Yu Z, Anandkumar A, Alvarez JM, Luo P (2021) SegFormer: simple and efficient design for semantic segmentation with transformers. Adv Neural Inf Process Syst 34:12077–12090
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2117–2125
Xiang Q, Wang X, Lai J, Song Y, Li R, Lei L (2022) Multi-scale group-fusion convolutional neural network for high-resolution range profile target recognition. IET Radar Sonar Navig 16(12):1997–2016
Ulyanov D, Vedaldi A, Lempitsky V (2016) Instance normalization: the missing ingredient for fast stylization. arXiv:1607.08022. https://doi.org/10.48550/arXiv.1607.08022
Guo R, Niu D, Qu L, Li Z (2021) SOTR: segmenting objects with transformers. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 7157–7166
Fang Y, Yang S, Wang X, Li Y, Fang C, Shan Y, Feng B, Liu W (2021) Instances as queries. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 6910–6919
Skurowski P, Abdulameer H, Błaszczyk J, Depta T, Kornacki A, Kozieł P (2018) Animal camouflage analysis: Chameleon database. Unpublished manuscript 2(6):7
Le T-N, Nguyen TV, Nie Z, Tran M-T, Sugimoto A (2019) Anabranch network for camouflaged object segmentation. Comput Vis Image Underst 184:45–56
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 10012–10022
Acknowledgements
This work was supported by the Postgraduate Scientific Research Innovation Project of Hunan Province (CX20231264), the Hunan Provincial Natural Science Foundation of China (2021JJ50074, 2022JJ50016), the Science and Technology Plan Project of Hunan Province (2016TP1020) and the 14th Five-Year Plan Key Disciplines and Application-oriented Special Disciplines of Hunan Province (Xiangjiaotong [2022] 351).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, C., Jiao, G., Yue, G. et al. Multi-scale pooling learning for camouflaged instance segmentation. Appl Intell 54, 4062–4076 (2024). https://doi.org/10.1007/s10489-024-05369-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-024-05369-2