Skip to main content
Log in

Multi-scale pooling learning for camouflaged instance segmentation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Camouflaged instance segmentation (CIS) focuses on handling instances that attempt to blend into the background. However, existing CIS methods emphasize global interactions but overlook hidden clues at various scales, resulting in inaccurate recognition of camouflaged instances. To address this, we propose a multi-scale pooling network (MSPNet) to mine the hidden cues offered by the camouflaged instances at various scales. The network achieves an enhanced fusion of multi-scale information mainly through multilayer pooling. Specifically, the pyramid pooling transformer (P2T) is utilized as a robust backbone for extracting multi-scale features. Then, we introduce an end-to-end pooling learning transformer (PLT) to obtain instance-aware parameters and high-quality mask features. To further augment the fusion of various mask features, we design a novel multi-scale complementary feature pooling (MCFP) module. Additionally, we also suggest an instance normalization module with fused spatial attention (FSA-IN) to combine instance-aware parameters and mask features, resulting in the final camouflaged instances. Experimental results show the effectiveness of MSPNet, surpassing existing CIS models on the COD10K-Test and NC4K datasets, with respective average precision (AP) scores of 49.6% and 53.4%. This demonstrates the effectiveness of the proposed approach in detecting camouflaged instances. Our code will be published at https://github.com/another-u/MSPNet-main.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data Availability

Data is available on request to the authors.

References

  1. Fan D-P, Ji G-P, Cheng M-M, Shao L (2021) Concealed object detection. IEEE Trans Pattern Anal Mach Intell 44(10):6024–6042

    Article  Google Scholar 

  2. Pang Y, Zhao X, Xiang T-Z, Zhang L, Lu H (2022) Zoom in and out: a mixed-scale triplet network for camouflaged object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 2160–2170

  3. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 3431–3440

  4. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision. pp 2961–2969

  5. Zhou Z, Siddiquee Md MR, Tajbakhsh N, Liang J (2018) Unet++: a nested U-Net architecture for medical image segmentation. In: Deep learning in medical image analysis and multimodal learning for clinical decision support: 4th international workshop, DLMIA 2018, and 8th international workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4. Springer, pp 3–11

  6. Fan D-P, Ji G-P, Sun G, Cheng M-M, Shen J, Shao L (2020) Camouflaged object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 2777–2787

  7. Mei H, Ji G-P, Wei Z, Yang X, Wei X, Fan D-P (2021) Camouflaged object segmentation with distraction mining. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 8772–8781

  8. Yan J, Le T-N, Nguyen K-D, Tran M-T, Do T-T, Nguyen TV (2021) MirrorNet: bio-inspired camouflaged object segmentation. IEEE Access 9:43290–43300

    Article  Google Scholar 

  9. Pei J, Cheng T, Fan D-P, Tang H, Chen C, Van Gool L (2022) OSFormer: one-stage camouflaged instance segmentation with transformers. In: Computer vision–ECCV 2022: 17th European conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVIII. Springer, pp 19–37

  10. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  11. Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2881–2890

  12. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 770–778

  13. Wang W, Xie E, Li X, Fan D-P, Song K, Liang D, Tong L, Luo P, Shao L (2022) PVT v2: improved baselines with pyramid vision transformer. Comput Vis Media 8(3):415–424

    Article  Google Scholar 

  14. Wu Y-H, Liu Y, Zhan X, Cheng M-M (2022) P2T: pyramid pooling transformer for scene understanding. IEEE Trans Pattern Anal Mach Intell

  15. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255

  16. Lv Y, Zhang J, Dai Y, Li A, Liu B, Barnes N, Fan D-P (2021) Simultaneously localize, segment and rank the camouflaged objects. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 11591–11601

  17. Bi H, Zhang C, Wang K, Tong J, Zheng F (2021) Rethinking camouflaged object detection: models and datasets. IEEE Trans Circuits Syst Video Technol 32(9):5708–5724

    Article  Google Scholar 

  18. Fan D-P, Ji G-P, Peng X, Cheng M-M, Sakaridis C, Van Gool L (2023) Advances in deep concealed scene understanding. Vis Intell 1(1):16

    Article  Google Scholar 

  19. Zhai W, Cao Y, Xie H, Zha Z-J (2022) Deep texton-coherence network for camouflaged object detection. IEEE Trans Multimed

  20. Chen G, Liu S-J, Sun Y-J, Ji G-P, Ya-Feng W, Zhou T (2022) Camouflaged object detection via context-aware cross-level fusion. IEEE Trans Circuits Syst Video Technol 32(10):6981–6993

    Article  Google Scholar 

  21. Li S, Florencio D, Li W, Zhao Y, Cook C (2018) A fusion framework for camouflaged moving foreground detection in the wavelet domain. IEEE Trans Image Process 27(8):3918–3930

    Article  MathSciNet  Google Scholar 

  22. He C, Li K, Zhang Y, Tang L, Zhang Y, Guo Z, Li X (2023) Camouflaged object detection with feature decomposition and edge reconstruction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 22046–22055

  23. Lin J, Tan X, Xu K, Ma L, Lau RWH (2023) Frequency-aware camouflaged object detection. ACM Trans Multimed Comput Commun Appl 19(2):1–16

    Article  Google Scholar 

  24. Ren J, Hu X, Zhu L, Xu X, Xu Y, Wang W, Deng Z, Heng P-A (2021) Deep texture-aware features for camouflaged object detection. IEEE Trans Circuits Syst Video Technol

  25. Zhai Q, Li X, Yang F, Chen C, Cheng H, Fan D-P (2021) Mutual graph learning for camouflaged object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 12997–13007

  26. Yin B, Zhang X, Hou Q, Sun B-Y, Fan D-P, Van Gool L (2022) Camoformer: masked separable attention for camouflaged object detection. arXiv:2212.06570. https://doi.org/10.48550/arXiv.2212.06570

  27. Le T-N, Cao Y, Nguyen T-C, Le M-Q, Nguyen K-D, Do T-T, Tran M-T, Nguyen TV (2021) Camouflaged instance segmentation in-the-wild: dataset, method, and benchmark suite. IEEE Trans Image Process 31:287–300

    Article  Google Scholar 

  28. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 8759–8768

  29. Xie X, Cheng G, Wang J, Yao X, Han J (2021) Oriented R-CNN for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 3520–3529

  30. Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. Advances in neural information processing systems, 29

  31. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: single shot multibox detector. In: Computer vision–ECCV 2016: 14th European conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer, pp 21–37

  32. Jiang H, Learned-Miller E (FG 2017) Face detection with the faster R-CNN. In: 2017 12th IEEE international conference on automatic face & gesture recognition. IEEE, pp 650–657

  33. Chen L-C, Hermans A, Papandreou G, Schroff F, Wang P, Adam H (2018) MaskLab: instance segmentation by refining object detection with semantic and direction features. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4013–4022

  34. Huang Z, Huang L, Gong Y, Huang C, Wang X (2019) Mask scoring R-CNN. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 6409–6418

  35. Chen X, Girshick R, He K, Dollár P (2019) Tensormask: a foundation for dense object segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 2061–2069

  36. Bolya D, Zhou C, Xiao F, Lee YJ (2019) YOLACT: real-time instance segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 9157–9166

  37. Cheng B, Collins MD, Zhu Y, Liu T, Huang TS, Adam H, Chen L-C (2020) Panoptic-DeepLab: a simple, strong, and fast baseline for bottom-up panoptic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 12475–12485

  38. Tian Z, Shen C, Chen H (2020) Conditional convolutions for instance segmentation. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16. Springer, pp 282–298

  39. Wang X, Kong T, Shen C, Jiang Y, Li L (2020) SOLO: segmenting objects by locations. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16. Springer, pp 649–665

  40. Wang X, Zhang R, Kong T, Li L, Shen C (2020) SOLOv2: dynamic and fast instance segmentation. Adv Neural Inf Process Syst 33:17721–17732

    Google Scholar 

  41. Luo N, Pan Y, Sun R, Zhang T, Xiong Z, Wu F (2023) Camouflaged instance segmentation via explicit de-camouflaging. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 17918–17927

  42. Nirthika R, Manivannan S, Ramanan A, Wang R (2022) Pooling in convolutional neural networks for medical image analysis: a survey and an empirical study. Neural Comput Appl 34(7):5321–5347

    Article  Google Scholar 

  43. Kamal Sarker Md M, Rashwan HA, Akram F, Banu SF, Saleh A, Singh VK, Chowdhury FUH, Abdulwahab S, Romani S, Radeva P et al (2018) SLSDeep: skin lesion segmentation based on dilated residual and pyramid pooling networks. In: Medical image computing and computer assisted intervention–MICCAI 2018: 21st International Conference, Granada, Spain, September 16-20, 2018, Proceedings, Part II 11. Springer, pp 21–29

  44. Lian X, Pang Y, Han J, Pan J (2021) Cascaded hierarchical atrous spatial pyramid pooling module for semantic segmentation. Pattern Recognit 110:107622

    Article  Google Scholar 

  45. Huang Z, Wang J, Xuesong F, Tao Y, Guo Y, Wang R (2020) DC-SPP-YOLO: dense connection and spatial pyramid pooling based YOLO for object detection. Inf Sci 522:241–258

    Article  MathSciNet  Google Scholar 

  46. Hou Q, Zhang L, Cheng M-M, Feng J (2020) Strip pooling: rethinking spatial pooling for scene parsing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 4003–4012

  47. Mo Y, Yan W, Yang X, Liu F, Liao Y (2022) Review the state-of-the-art technologies of semantic segmentation based on deep learning. Neurocomputing 493:626–646

    Article  Google Scholar 

  48. Abbas Zaidi SS, MS Ansari, Aslam A, Kanwal N, Asghar M, Lee B (2022) A survey of modern deep learning based object detection models. Digit Signal Process 103514

  49. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: Computer vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16. Springer, pp 213–229

  50. Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2020) Deformable DETR: deformable transformers for end-to-end object detection. arXiv:2010.04159. https://doi.org/10.48550/arXiv.2010.04159

  51. Wu K, Peng H, Chen M Fu, J, Chao H (2021) Rethinking and improving relative position encoding for vision transformer. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 10033–10041

  52. Xie E, Wang W, Yu Z, Anandkumar A, Alvarez JM, Luo P (2021) SegFormer: simple and efficient design for semantic segmentation with transformers. Adv Neural Inf Process Syst 34:12077–12090

    Google Scholar 

  53. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2117–2125

  54. Xiang Q, Wang X, Lai J, Song Y, Li R, Lei L (2022) Multi-scale group-fusion convolutional neural network for high-resolution range profile target recognition. IET Radar Sonar Navig 16(12):1997–2016

    Article  Google Scholar 

  55. Ulyanov D, Vedaldi A, Lempitsky V (2016) Instance normalization: the missing ingredient for fast stylization. arXiv:1607.08022. https://doi.org/10.48550/arXiv.1607.08022

  56. Guo R, Niu D, Qu L, Li Z (2021) SOTR: segmenting objects with transformers. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 7157–7166

  57. Fang Y, Yang S, Wang X, Li Y, Fang C, Shan Y, Feng B, Liu W (2021) Instances as queries. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 6910–6919

  58. Skurowski P, Abdulameer H, Błaszczyk J, Depta T, Kornacki A, Kozieł P (2018) Animal camouflage analysis: Chameleon database. Unpublished manuscript 2(6):7

  59. Le T-N, Nguyen TV, Nie Z, Tran M-T, Sugimoto A (2019) Anabranch network for camouflaged object segmentation. Comput Vis Image Underst 184:45–56

    Article  Google Scholar 

  60. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 10012–10022

Download references

Acknowledgements

This work was supported by the Postgraduate Scientific Research Innovation Project of Hunan Province (CX20231264), the Hunan Provincial Natural Science Foundation of China (2021JJ50074, 2022JJ50016), the Science and Technology Plan Project of Hunan Province (2016TP1020) and the 14th Five-Year Plan Key Disciplines and Application-oriented Special Disciplines of Hunan Province (Xiangjiaotong [2022] 351).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ge Jiao.

Ethics declarations

Conflict of interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, C., Jiao, G., Yue, G. et al. Multi-scale pooling learning for camouflaged instance segmentation. Appl Intell 54, 4062–4076 (2024). https://doi.org/10.1007/s10489-024-05369-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-024-05369-2

Keywords

Navigation