Skip to main content
Log in

Exploring class-agnostic pixels for scribble-supervised high-resolution salient object detection

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Successful salient object detection is largely dependent on large-scale fine-grained annotated datasets. However, pixel-level annotation is a laborious process compared with weak labels and scant research has been done on high-resolution images. To mitigate these drawbacks, we propose a distinctive network to explore salient object in high-resolution images under scribble-supervised and relabel a previous high-resolution dataset with scribbles, namely Scr-HRSOD, in which each image is labelled in a few seconds. Since scribble labels lack structural information about objects, a boundary structure maintenance branch with shallow layers is introduced to capture low-level spatial details. Within the constraint of boundary branches, a lightweight contextual semantic branch process compressed inputs to obtain high-level semantic context and iteratively propagates the partially annotated pixels to surrounding similar regions, which are then employed as pseudo-labels to supervise the network. Extensive evaluations on five datasets illustrate the effectiveness of our introduced method. On HRSOD datasets, we achieve higher 0.861 Fmax and 0.887 Sm values, which outperforms the existing foremost weakly supervised methods and even the fully supervised methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. The annotation tool we employed is the scribble annotation tool in Image Labeler in Matlab R2019b.

  2. The Scr-HRSOD datasets are publicly available: https://github.com/YQP-CV/Scribble-Supervised-HRSOD and our code is about to be open source.

References

  1. Shon AP, Grimes DB, Baker CL, et al. (2005) Probabilistic gaze imitation and saliency learning in a robotic head. In: Proceedings of the IEEE International Conference on Robotics and Automation 2865–2870

  2. Zhi H, Shen J, Hong B (2018) Saliency driven region-edge-based top down level set evolution reveals the asynchronous focus in image segmentation. Pattern Recognit: J Pattern Recognit Soc 80:241–255

    Article  Google Scholar 

  3. Hong S, You T, Kwak S, Han B (2015) Online tracking by learning discriminative saliency map with convolutional neural network. International conference on machine learning 597–606

  4. Shen JB, Peng JT, Shao L (2018) Submodular trajectories for better motion segmentation in videos. IEEE Trans Image Proc 27(6):2688–2700

    Article  MATH  Google Scholar 

  5. Wang WG, Shen JB, Ling HB (2018) A deep network solution for attention and aesthetics aware photo ropping. IEEE Trans Pattern Anal Mach Intell 41(7):1531–1544

    Article  Google Scholar 

  6. Luo ZM, Mishra A, Achkar et al. (2017) Non-local deep features for salient object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 6609–6617

  7. Liu N, Han JW, Yang MH et al. (2018) Picanet: Learning pixel-wise contextual attention for saliency detection. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 684–690

  8. Zeng Y, Zhang PP, Zhang JM, et al. (2019) Towards high-resolution salient object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision 7234–7243

  9. Zhang P, Liu W, Zeng Y et al (2021) Looking for the detail and context devils: high-resolution salient object detection. IEEE Trans Image Proc 99:1–1

    Google Scholar 

  10. Wang L, Lu H, Wang Y, et al. (2017) Learning to detect salient objects with image-level supervision. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p. 3796–3805. https://doi.org/10.1109/CVPR.2017.404

  11. Qian M, Qi J, Zhang L et al (2019) Language-aware weak supervision for salient object detection. Pattern Recognit 96:106955

    Article  Google Scholar 

  12. Y Zeng, Y Zhuge, H Lu, et al. (2019) Multi-source weak supervision for saliency detection. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), p. 6067–6076. https://doi.org/10.1109/CVPR.2019.00623

  13. Zhang J, Yu X, Li A, et al. (2020) Weakly-supervised salient object detection via scribble annotations. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), p. 12546–12555

  14. Yu C, Wang J, Peng C, et al. (2018) BiSeNet: bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), p. 325–341

  15. Yu C, Gao C, Wang J, et al. (2020) BiSeNet V2: bilateral network with guided aggregation for real-time semantic segmentation, arXiv preprint arXiv: 2004.02147 [cs.CV]

  16. Zhao H , Qi X , Shen X , et al. (2017) ICNet for Real-Time Semantic Segmentation on High-Resolution Images. In: Proceedings of the European conference on computer vision (ECCV), p. 405–420

  17. Poudel R, Liwicki S, Cipolla R. (2019) Fast-SCNN: fast semantic segmentation network, arXiv preprint arXiv:1902.04502

  18. Poudel R, Bonde U, Liwicki S, et al. (2018) ContextNet: exploring context and detail for semantic segmentation in real-time, arXiv preprint arXiv:1805.04554

  19. Sandler M, Howard A, Zhu M, et al. (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), p. 4510–4520

  20. Zhang X, Zhou X, Lin M, et al. (2017) ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, p. 6848–6856

  21. Ma N, Zhang X, Zheng H T, et al. ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: Proceedings of the European conference on computer vision (ECCV), p. 116–131

  22. Iandola, Forrest N., et al. (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv preprint arXiv: 1602.07360

  23. Long, J, Shelhamer E, Darrell T. (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, p. 3431–3440

  24. Ronneberger O, Fischer P, Brox T. (2015) U-Net: convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, p. 234–241

  25. Wang WG, Lai QX, Fu HZ, Shen JB, Ling HB. (2021) Salient object detection in the deep learning era: an in-depth survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, p. 220–232

  26. Howard, Andrew G, et al. (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv: 1704.04861

  27. Lin G, Milan A, Shen C, et al. (2017) RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p. 5168–5177

  28. Wang J, Yang QP, Yang SQ et al (2022) Dual-path processing network for high-resolution salient object detection. Appl Intell. https://doi.org/10.1007/s10489-021-02971-6

    Article  Google Scholar 

  29. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556

  30. He KM, Zhang XY, Ren SQ, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, p. 770–778

  31. Huang G, Liu Z, Maaten LV, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, p. 4700–4708

  32. Jia D, Wei D, Socher R, et al. (2009) ImageNet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, p. 248–255

  33. Siva, P Russell C, Xiang T, Agapito L (2013) Looking beyond the image: Unsupervised learning for object saliency and detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, p. 3238–3245

  34. Bearman A, Russakovsky O, Ferrari V, et al. (2016) What's the Point: Semantic Segmentation with Point Supervision. Springer, Cham, p. 549–565

  35. Chen LC, Papandreou G, Kokkinos I et al (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  36. Boykov, Yuri Y, M-P Jolly (2001) Interactive graph cuts for optimal boundary & region segmentation of objects in ND images. In: Proceedings eighth IEEE international conference on computer vision. ICCV 2001. IEEE, p. 105–112

  37. Liu Y, Cheng M, M Hu, et al. (2017) Richer convolutional features for edge detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, p. 3000–3009

  38. Chen LC, Papandreou G, Schroff F, et al. (2017) Rethinking atrous convolution for semantic image segmentation, arXiv preprint arXiv:1706.05587

  39. Fan MY, Huang SQ, Wei XM, et al. (2021) Rethinking BiSeNet For Real-time Semantic Segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p. 9716–9725

  40. Zhao J X, Liu J J, Fan D P, et al. (2019) EGNet: Edge guidance network for salient object detection. Proceedings of the IEEE/CVF International Conference on Computer Vision, p. 8779–8788

  41. Tang M, Djelouah A, Perazzi F, et al. (2018) Normalized cut loss for weakly-supervised cnn segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, p. 1818–1827

  42. Yan Q, Xu L, Shi JP, Jia JY (2013) Hierarchical saliency detection. Computer Vision and Pattern Recognition (CVPR). In: 2013 IEEE Conference, p. 1155–1162

  43. Li GB, Yu YZ (2015) Visual saliency based on multiscale deep features. In: Proceedings of the IEEE conference on computer vision and pattern recognition, p. 5455–5463

  44. Wolfgang E, Peter K (2015) Does luminance-contrast contribute to a saliency map for overt visual attention? Eur J Neurosci 17(5):1089–1097

    Google Scholar 

  45. Wang LJ, Lu HC, Wang YF, Mengyang Feng (2017) Learning to detect salient objects with image-level supervision. In: IEEE Conference on Computer Vision & Pattern Recognition, p. 136–145

  46. Zhang PP, Wang D, Lu HC, Wang HY (2017) Amulet: aggregating multi-level convolutional features for salient object detection. In: Proceedings of the IEEE International Conference on Computer Vision, p. 202–211

  47. Zhang D, Han J, Zhang Y. (2017) Supervision by fusion: Towards unsupervised learning of deep salient object detector. In: Proceedings of the IEEE International Conference on Computer Vision, p. 4048–4056

  48. Kingma D P, Ba J. (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

  49. Lin D, Dai JF, Jia JY, et al. (2016) Scribblesup: Scribble-supervised convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, p. 3159–3167

  50. Wang B, Qi GJ, Tang S, et al. (2019) Boundary perception guidance: A scribble-supervised semantic segmentation approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition, p. 3663–3669

Download references

Acknowledgements

This research is supported by the National Natural Science Foundation of China (No.62002100), the National Natural Science Foundation of China (No.61802111) and the Science and Technology Foundation of Henan Province of China (No.212102210156). National Natural Science Foundation of China (No.62176088).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, Q., Zhou, Y., Chai, X. et al. Exploring class-agnostic pixels for scribble-supervised high-resolution salient object detection. Neural Comput & Applic 35, 3469–3482 (2023). https://doi.org/10.1007/s00521-022-07915-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07915-w

Keywords

Navigation