Abstract
Camouflaged object segmentation (COS) is a very challenging task due to the deceitful appearances of the candidate objects to the noisy backgrounds. Most existing state-of-the-art methods mimic the first-positioning-then-focus mechanism of predators, but still fail in positioning camouflaged objects in cluttered scenes or delineating their boundaries. The key reason is that their methods do not have a comprehensive understanding of the scene when they spot and focus on the objects, so that they are easily attracted by local surroundings. An ideal COS model should be able to process local and global information at the same time, i.e., to have omni perception of the scene through the whole process of camouflaged object segmentation. To this end, we propose to learn the omni perception for the first-positioning-then-focus COS scheme. Specifically, we propose an omni perception network (OPNet) with two novel modules, i.e., the pyramid positioning module (PPM) and dual focus module (DFM). They are proposed to integrate local features and global representations for accurate positioning of the camouflaged objects and focus on their boundaries, respectively. Extensive experiments demonstrate that our method, which runs at 54 fps, significantly outperforms 15 cutting-edge models on 4 challenging datasets under 4 standard metrics. The code will be made publicly available.
Similar content being viewed by others
References
Achanta, R., Hemami, S., Estrada, F., & Susstrunk, S. (2009). Frequency-tuned salient region detection. In CVPR.
Chen, S., Tan, X., Wang, B., & Hu, X. (2018). Reverse attention for salient object detection. In ECCV.
Chen, Z., Xu, Q., Cong, R., & Huang, Q. (2020). Global context-aware progressive aggregation network for salient object detection. In AAAI.
Cheng, M. M., Mitra, N. J., Huang, X., Torr, P. H., & Hu, S. M. (2014). Global contrast based salient region detection. IEEE TPAMI.
Cheng, X., Xiong, H., Fan, D. P., Zhong, Y., Harandi, M., Drummond, T., & Ge, Z. (2022). Implicit motion handling for video camouflaged object detection. In CVPR.
De Boer, P. T., Kroese, D. P., Mannor, S., & Rubinstein, R. Y. (2005). A tutorial on the cross-entropy method. Annals of Operations Research, 134, 19–67. https://doi.org/10.1007/s10479-005-5724-z
Deng, Z., Hu, X., Zhu, L., Xu, X., Qin, J., Han, G., & Heng, P. A. (2018). R3net: Recurrent residual refinement network for saliency detection. In IJCAI.
Fan, D. P., Cheng, M. M., Liu, Y., Li, T., & Borji, A. (2017). Structure-measure: A new way to evaluate foreground maps. In ICCV.
Fan, D. P., Ji, G. P., Cheng, M. M., & Shao, L. (2021). Concealed object detection. IEEE TPAMI.
Fan, D. P., Ji, G. P., Qin, X., & Cheng, M. M. (2021). Cognitive vision inspired object segmentation metric and loss function. Scientia Sinica Informationis, 51(9), 1475. https://doi.org/10.1360/SSI-2020-0370
Fan, D. P., Ji, G. P., Sun, G., Cheng, M. M., Shen, J., & Shao, L. (2020). Camouflaged object detection. In CVPR.
Fan, D. P., Ji, G. P., Zhou, T., Chen, G., Fu, H., Shen, J., & Shao, L. (2020). Pranet: Parallel reverse attention network for polyp segmentation. In MICCAI.
Feng, M., Lu, H., & Ding, E. (2019). Attentive feedback network for boundary-aware salient object detection. In CVPR.
He, J., Deng, Z., & Qiao, Y. (2019). Dynamic multi-scale filters for semantic segmentation. In ICCV.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR.
Hou, Q., Cheng, M. M., Hu, X., Borji, A., Tu, Z., & Torr, P. (2019). Deeply supervised salient object detection with short connections. IEEE TPAMI.
Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In CVPR.
Hu, X., Zhu, L., Fu, C. W., Qin, J., & Heng, P. A. (2018). Direction-aware spatial context features for shadow detection. In CVPR.
Hu, X., Zhu, L., Fu, C. W., Qin, J., & Heng, P. A. (2018). Direction-aware spatial context features for shadow detection. In CVPR.
Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., & Liu, W. (2019). Ccnet: Criss-cross attention for semantic segmentation. In ICCV.
Krähenbühl, P., & Koltun, V. (2011). Efficient inference in fully connected crfs with gaussian edge potentials. In NeurIPS.
Le, T. N., Nguyen, T. V., Nie, Z., Tran, M. T., & Sugimoto, A. (2019). Anabranch network for camouflaged object segmentation. Computer Vision and Image Understanding, 184, 45–56.
Li, A., Zhang, J., Lv, Y., Liu, B., Zhang, T., & Dai, Y. (2021). Uncertainty-aware joint salient object and camouflaged object detection. In CVPR.
Li, X., Song, D., & Dong, Y. (2020). Hierarchical feature fusion network for salient object detection. IEEE Transactions on Image Processing, 29, 9165–9175.
Li, Z., Lang, C., Liew, J. H., Li, Y., Hou, Q., & Feng, J. (2021). Cross-layer feature pyramid network for salient object detection. IEEE Transactions on Image Processing, 30, 4587–4598.
Liu, N., Han, J., & Yang, M. H. (2018). Picanet: Learning pixel-wise contextual attention for saliency detection. In CVPR.
Liu, N., Zhang, N., Wan, K., Shao, L., & Han, J. (2021). Visual saliency transformer. In ICCV.
Liu, W., Rabinovich, A., & Berg, A. C. (2015). Parsenet: Looking wider to see better. arXiv:1506.04579
Liu, Y., Han, J., Zhang, Q., & Shan, C. (2019). Deep salient object detection with contextual information guidance. IEEE Transactions on Image Processing, 29, 360–374.
Liu, Y., Long, C., Zhang, Z., Liu, B., Zhang, Q., Yin, B., & Yang, X. (2022). Explore contextual information for 3d scene graph generation. IEEE Transactions on Visualization and Computer Graphics. https://doi.org/10.1109/TVCG.2022.3219451
Liu, Y., Xie, J., Shi, X., Qiao, Y., Huang, Y., Tang, Y., & Yang, X. (2021). Tripartite information mining and integration for image matting. In ICCV.
Lv, Y., Zhang, J., Dai, Y., Li, A., Liu, B., Barnes, N., & Fan, D. P. (2021). Simultaneously localize, segment and rank the camouflaged objects. In CVPR.
Margolin, R., Zelnik-Manor, L., & Tal, A. (2014). How to evaluate foreground maps? In CVPR.
Mattyus, G., Luo, W., & Urtasun, R. (2017). Deeproadmapper: Extracting road topology from aerial images. In ICCV.
Mei, H., Dong, B., Dong, W., Peers, P., Yang, X., Zhang, Q., & Wei, X. (2021). Depth-aware mirror segmentation. In CVPR.
Mei, H., Dong, B., Dong, W., Yang, J., Baek, S. H., Heide, F., Peers, P., Wei, X., & Yang, X. (2022). Glass segmentation using intensity and spectral polarization cues. In CVPR.
Mei, H., Ji, G., Wei, Z., Yang, X., Wei, X., & Fan, D. (2021). Camouflaged object segmentation with distraction mining. In CVPR.
Mei, H., Liu, Y., Wei, Z., Zhou, D., Wei, X., Zhang, Q., & Yang, X. (2021). Exploring dense context for salient object detection. IEEE Transactions on Circuits and Systems for Video Technology, 32, 1378–1389.
Mei, H., Wang, Z., Yang, X., Wei, X., & Delbruck, T. (2023). Deep polarization reconstruction with pdavis events. In CVPR.
Mei, H., Yang, X., Wang, Y., Liu, Y., He, S., Zhang, Q., Wei, X., & Lau, R. W. (2020). Don’t hit me! glass detection in real-world scenes. In CVPR.
Mei, H., Yang, X., Yu, L., Zhang, Q., Wei, X., & Lau, R. W. (2022). Large-field contextual feature learning for glass detection. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2022.3181973
Mei, H., Yang, X., Zhou, Y., Ji, G. P., Wei, X., Fan, D. P. (2023). Distraction-aware camouflaged object segmentation. In SCIENTIA SINICA Informationis (SSI).
Mei, H., Yu, L., Xu, K., Wang, Y., Yang, X., Wei, X., & Lau, R. W. (2022). Mirror segmentation via semantic-aware contextual contrasted feature learning. ACM Transactions on Multimedia Computing, Communications and Applications, 19, 1–22.
Pang, Y., Zhao, X., Xiang, T. Z., Zhang, L., & Lu, H. (2022). Zoom in and out: A mixed-scale triplet network for camouflaged object detection. In CVPR.
Pang, Y., Zhao, X., Zhang, L., & Lu, H. (2020). Multi-scale interactive network for salient object detection. In CVPR.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. In NeurIPS.
Peng, Z., Huang, W., Gu, S., Xie, L., Wang, Y., Jiao, J., & Ye, Q. (2021). Conformer: Local features coupling global representations for visual recognition. In ICCV.
Qiao, Y., Liu, Y., Yang, X., Zhou, D., Xu, M., Zhang, Q., & Wei, X. (2020). Attention-guided hierarchical structure aggregation for image matting. In CVPR.
Qin, X., Zhang, Z., Huang, C., Gao, C., Dehghan, M., & Jagersand, M. (2019). Basnet: Boundary-aware salient object detection. In CVPR.
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. In MICCAI.
Skurowski, P., Abdulameer, H., Błaszczyk, J., Depta, T., Kornacki, A., & Kozieł, P. (2018). Animal camouflage analysis: Chameleon database. Unpublished Manuscript.
Su, J., Li, J., Zhang, Y., Xia, C., & Tian, Y. (2019). Selectivity or invariance: Boundary-aware salient object detection. In ICCV
Tian, X., Xu, K., Yang, X., Du, L., Yin, B., & Lau, R. W. (2022). Bi-directional object-context prioritization learning for saliency ranking. In CVPR.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In NeurIPS.
Wang, W., Shen, J., Cheng, M. M., & Shao, L. (2019). An iterative and cooperative top-down and bottom-up inference network for salient object detection. In CVPR.
Wang, W., Zhao, S., Shen, J., Hoi, S. C. H., & Borji, A. (2019). Salient object detection with pyramid attention and salient edges. In CVPR.
Wang, X., Ma, H., Chen, X., & You, S. (2017). Edge preserving and multi-scale contextual neural network for salient object detection. IEEE Transactions on Image Processing, 27(1), 121–134.
Wang, Y., Zhao, X., Hu, X., Li, Y., & Huang, K. (2019). Focal boundary guided salient object detection. IEEE Transactions on Image Processing, 28(6), 2813–2824.
Wang, Z., Hu, Y., & Liu, S. C. (2022). Exploiting spatial sparsity for event cameras with visual transformers. In ICIP.
Wang, Z., Xiang, D., Hou, S., & Wu, F. (2016). Background-driven salient object detection. IEEE Transactions on Multimedia, 19(4), 750–762.
Wei, J., Wang, S., & Huang, Q. (2020). F3net: Fusion, feedback and focus for salient object detection. In AAAI.
Woo, S., Park, J., Lee, J. Y., & So Kweon, I. (2018). Cbam: Convolutional block attention module. In ECCV.
Xu, K., Yang, X., Yin, B., & Lau, R. W. (2020). Learning to restore low-light images via decomposition-and-enhancement. In CVPR.
Yan, J., Le, T. N., Nguyen, K. D., Tran, M. T., Do, T. T., & Nguyen, T. V. (2020). Mirrornet: Bio-inspired adversarial attack for camouflaged object segmentation. arXiv:2007.12881
Yang, M., Yu, K., Zhang, C., Li, Z., & Yang, K. (2018). Denseaspp for semantic segmentation in street scenes. In CVPR.
Yang, X., Mei, H., Xu, K., Wei, X., Yin, B., & Lau, R. W. (2019). Where is my mirror? In ICCV.
Yang, X., Mei, H., Zhang, J., Xu, K., Yin, B., Zhang, Q., & Wei, X. (2019). Drfn: Deep recurrent fusion network for single-image super-resolution with large factors. IEEE Transactions on Multimedia, 21, 328–337.
Yu, L., Mei, H., Dong, W., Wei, Z., Zhu, L., Wang, Y., & Yang, X. (2022). Progressive glass segmentation. IEEE Transactions on Image Processing, 31, 2920–2933.
Zhai, Q., Li, X., Yang, F., Chen, C., Cheng, H., & Fan, D. P. (2021). Mutual graph learning for camouflaged object detection. In CVPR.
Zhang, J., Ding, R., Ban, M., & Guo, T. (2022). Fdsnet: An accurate real-time surface defect segmentation network. In ICASSP ICASSP.
Zhang, J., Long, C., Wang, Y., Yang, X., Mei, H., & Yin, B. (2020). Multi-context and enhanced reconstruction network for single image super resolution. In ICME.
Zhang, L., Dai, J., Lu, H., He, Y., & Wang, G. (2018). A bi-directional message passing model for salient object detection. In CVPR.
Zhang, P., Wang, D., Lu, H., Wang, H., & Ruan, X. (2017). Amulet: Aggregating multi-level convolutional features for salient object detection. In ICCV.
Zhang, X., Wang, T., Qi, J., Lu, H., & Wang, G. (2018). Progressive attention guided recurrent network for salient object detection. In CVPR.
Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid scene parsing network. In CVPR.
Zhao, T., & Wu, X. (2019). Pyramid feature attention network for saliency detection. In CVPR.
Zhou, H., Xie, X., Lai, J. H., Chen, Z., & Yang, L. (2020). Interactive two-stream decoder for accurate and fast saliency detection. In CVPR.
Zhou, S., Wang, J., Wang, L., Zhang, J., Wang, F., Huang, D., & Zheng, N. (2020). Hierarchical and interactive refinement network for edge-preserving salient object detection. IEEE Transactions on Image Processing, 30, 1–14.
Zhu, L., Deng, Z., Hu, X., Fu, C. W., Xu, X., Qin, J., & Heng, P. A. (2018). Bidirectional feature pyramid network with recurrent attention residual modules for shadow detection. In ECCV.
Acknowledgements
This work was supported in part by National Key Research and Development Program of China (2022ZD0210500), the National Natural Science Foundation of China under Grants 61972067 /U21A20491, and the Distinguished Young Scholars Funding of Dalian (No. 2022RJ01).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Karteek Alahari, Ph.D.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mei, H., Xu, K., Zhou, Y. et al. Camouflaged Object Segmentation with Omni Perception. Int J Comput Vis 131, 3019–3034 (2023). https://doi.org/10.1007/s11263-023-01838-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-023-01838-2