Skip to main content
Log in

Video-based salient object detection via spatio-temporal difference and coherence

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Salient object detection aims to extract the attractive objects in images and videos. It can support various robotics tasks and multimedia applications, such as object detection, action recognition and scene analysis. However, efficient detection of salient objects in videos still faces many challenges as compared to that in still images. In this paper, we propose a novel video-based salient object detection method by exploring spatio-temporal characteristics of video content, i.e., spatial-temporal difference and spatial-temporal coherence. First, we initialize the saliency map for each keyframe by deriving spatial-temporal difference from color cue and motion cue. Next, we generate the saliency maps of other frames by propagating the saliency intra and inter frames with the constraint of spatio-temporal coherence. Finally, the saliency maps of both keyframes and non-keyframes are refined in the saliency propagation. In this way, we can detect salient objects in videos efficiently by exploring their spatio-temporal characteristics. We evaluate the proposed method on two public datasets, named SegTrackV2 and UVSD. The experimental results show that our method outperforms the state-of-the-art methods when taking account of both effectiveness and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Achanta R, Hemami S, Estrada F, Susstrunk S (2009) Frequency-tuned salient region detection. In: IEEE conference on computer vision and pattern recognition, pp 1597–1604

  2. Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Süsstrunk S (2012) Slic superpixels compared to state-of-the-art superpixel methods. IEEE Trans Pattern Anal Mach Intell 34(11):2274–2282

    Article  Google Scholar 

  3. Bao BK, Min W, Lu K, Xu C (2013) Social event detection with robust high-order co-clustering. In: ACM conference on international conference on multimedia retrieval. ACM, pp 135–142

  4. Bao BK, Zhu G, Shen J, Yan S (2013) Robust image analysis with sparse representation on quantized visual features. IEEE Trans Image Process 22(3):860–871

    Article  MathSciNet  MATH  Google Scholar 

  5. Borji A, Cheng MM, Jiang H, Li J (2015) Salient object detection: a benchmark. IEEE Trans Image Process 24(12):5706–5722

    Article  MathSciNet  Google Scholar 

  6. Brox T, Malik J (2011) Large displacement optical flow: descriptor matching in variational motion estimation. IEEE Trans Pattern Anal Mach Intell 33(3):500–513

    Article  Google Scholar 

  7. Cheng MM, Mitra NJ, Huang X, Torr PH, Hu SM (2015) Global contrast based salient region detection. IEEE Trans Pattern Anal Mach Intell 37(3):569–582

    Article  Google Scholar 

  8. Cheng Z, Li X, Shen J, Hauptmann AG (2016) Which information sources are more effective and reliable in video search. In: International ACM SIGIR conference on research and development in information retrieval. ACM, pp 1069–1072

  9. Desingh K, K MK, Rajan D, Jawahar C (2013) Depth really matters: improving visual salient region detection with depth. In: British machine vision conference

  10. Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labeling. IEEE Trans Pattern Anal Mach Intell 35(8):1915–1929

    Article  Google Scholar 

  11. Gao Z, Zhang H, Xu G, Xue YB (2015) Multi-perspective and multi-modality joint representation and recognition model for 3d action recognition. Neurocomputing 151:554–564

    Article  Google Scholar 

  12. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE conference on computer vision and pattern recognition, pp 580– 587

  13. Guo C, Zhang L (2010) A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE Trans Image Process 19(1):185–198

    Article  MathSciNet  MATH  Google Scholar 

  14. Guo J, Ren T, Bei J (2016) Salient object detection for rgb-d image via saliency evolution. In: IEEE international conference on multimedia and expo. IEEE

  15. Hou Q, Cheng MM, Hu XW, Borji A, Tu Z, Torr P (2016) Deeply supervised salient object detection with short connections. arXiv:1611.04849

  16. Hu X, Wang G, Wu H, Lu H (2014) Rotation-invariant texture retrieval based on complementary features. In: International symposium on computer, consumer and control. IEEE, pp 311–314

  17. Huang L, Luo B (2016) Salient object detection via video spatio-temporal difference and coherence. In: International conference on computational intelligence and security. IEEE, pp 218–222

  18. Huang CR, Chang YJ, Yang ZX, Lin YY (2014) Video saliency map detection by dominant camera motion removal. IEEE Trans Circ Syst Video Technol 24(8):1336–1349

    Article  Google Scholar 

  19. Jiang B, Zhang L, Lu H, Yang C, Yang MH (2013) Saliency detection via absorbing markov chain. In: IEEE international conference on computer vision, pp 1665–1672

  20. Ju R, Liu Y, Ren T, Ge L, Wu G (2015) Depth-aware salient object detection using anisotropic center-surround difference. Signal Process Image Commun 38:115–126

    Article  Google Scholar 

  21. Lang C, Nguyen TV, Katti H, Yadati K, Kankanhalli M, Yan S (2012) Depth matters: influence of depth cues on visual saliency. In: European conference on computer vision, pp 101–115

  22. Li G, Yu Y (2016) Deep contrast learning for salient object detection. In: IEEE conference on computer vision and pattern recognition, pp 478–487

  23. Li F, Kim T, Humayun A, Tsai D, Rehg JM (2013) Video segmentation by tracking many figure-ground segments. In: IEEE international conference on computer vision, pp 2192–2199

  24. Li S, Ju R, Ren T, Wu G (2015) Saliency cuts based on adaptive triple thresholding. In: IEEE international conference on image processing. IEEE, pp 4609–4613

  25. Li Y, Lu H, Li J, Li X, Li Y, Serikawa S (2016) Underwater image de-scattering and classification by deep neural network. Comput Electric Eng 54:68–77

    Article  Google Scholar 

  26. Liu AA, Su YT, Nie WZ, Kankanhalli M (2017) Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Trans Pattern Anal Mach Intell 39(1):102–114

    Article  Google Scholar 

  27. Liu H, Heynderickx I (2011) Visual attention in objective image quality assessment: based on eye-tracking data. IEEE Trans Circ Syst Vid Technol 21(7):971–982

    Article  Google Scholar 

  28. Liu Y, Zhou F, Liu W, De la Torre F, Liu Y (2010) Unsupervised summarization of rushes videos. In: ACM international conference on multimedia. ACM, pp 751–754

  29. Liu Y, Liu Y, Chan KC (2011) Tensor-based locally maximum margin classifier for image and video classification. Comput Vis Image Underst 115(3):300–309

    Article  Google Scholar 

  30. Liu Z, Li J, Ye L, Sun G, Shen L (2015) Saliency detection for unconstrained videos using superpixel-level graph and spatiotemporal propagation. In: IEEE transactions on circuits and systems for video technology

  31. Liu Z, Zhang X, Luo S, Le Meur O (2014) Superpixel-based spatiotemporal saliency detection. IEEE Trans Circ Syst Vid Technol 24(9):1522–1540

    Article  Google Scholar 

  32. Lu H, Li Y, Nakashima S, Serikawa S (2016) Single image dehazing through improved atmospheric light estimation. Multimed Tools Appl 75(24):17081–17096

    Article  Google Scholar 

  33. Lu H, Serikawa S (2014) Underwater scene enhancement using weighted guided median filter. In: IEEE international conference on multimedia and expo. IEEE, pp 1–6

  34. Nie L, Hong R, Zhang L, Xia Y, Tao D, Sebe N (2016) Perceptual attributes optimization for multivideo summarization. IEEE Trans Cybern 46(12):2991–3003

    Article  Google Scholar 

  35. Niu Y, Geng Y, Li X, Liu F (2012) Leveraging stereopsis for saliency analysis. In: IEEE conference on computer vision and pattern recognition, pp 454–461

  36. Peng H, Li B, Xiong W, Hu W, Ji R (2014) RGBD salient object detection: a benchmark and algorithms. In: European conference on computer vision, pp 92–109

  37. Qin Y, Lu H, Xu Y, Wang H (2015) Saliency detection via cellular automata. In: IEEE conference on computer vision and pattern recognition, pp 110–119

  38. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  39. Ren T, Liu Y, Wu G (2009) Image retargeting based on global energy optimization. In: IEEE international conference on multimedia and expo, pp 406–409

  40. Ren T, Liu Y, Ju R, Wu G (2016) How important is location information in saliency detection of natural images. Multimed Tools Appl 75(5):2543–2564

    Article  Google Scholar 

  41. Sang J, Xu C (2011) Browse by chunks: topic mining and organizing on web-scale social media. ACM Trans Multimed Comput Commun Appl 7(1):30

    Google Scholar 

  42. Sang J, Xu C (2012) Right buddy makes the difference: an early exploration of social relation analysis in multimedia applications. In: ACM international conference on multimedia. ACM, pp 19–28

  43. Seo HJ, Milanfar P (2009) Static and space-time visual saliency detection by self-resemblance. J Vis 9(12):15

    Article  Google Scholar 

  44. Wang W, Shen J, Porikli F (2015) Saliency-aware geodesic video object segmentation. In: IEEE conference on computer vision and pattern recognition, pp 3395–3402

  45. Xu Z, Yang Y, Hauptmann AG (2015) A discriminative cnn video representation for event detection. In: IEEE conference on computer vision and pattern recognition, pp 1798–1807

  46. Yang C, Zhang L, Lu H, Ruan X, Yang MH (2013) Saliency detection via graph-based manifold ranking. In: IEEE conference on computer vision and pattern recognition, pp 3166–3173

  47. Zhang L, Hong R, Nie L, Hong C (2016) A biologically inspired automatic system for media quality assessment. IEEE Trans Autom Sci Eng 13(2):894–902

    Article  Google Scholar 

  48. Zhong SH, Liu Y, Liu Y (2011) Bilinear deep learning for image classification. In: ACM international conference on multimedia. ACM, pp 343–352

  49. Zhong SH, Liu Y, Ren F, Zhang J, Ren T (2013) Video saliency detection via dynamic consistent spatio-temporal attention modelling. In: AAAI conference on artificial intelligence

  50. Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in neural information processing systems, pp 487–495

  51. Zhu W, Liang S, Wei Y, Sun J (2014) Saliency optimization from robust background detection. In: IEEE conference on computer vision and pattern recognition, pp 2814–2821

  52. Zhu L, Shen J, Xie L (2017) Unsupervised visual hashing with semantic assistant for content-based image retrieval. IEEE Trans Knowl Data Eng 29(2):472–486

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the anonymous reviews for their helpful suggestion. This work is supported by National Science Foundation of China (61202320) and Research Project of Excellent State Key Laboratory (61223003).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bin Luo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, L., Luo, B. Video-based salient object detection via spatio-temporal difference and coherence. Multimed Tools Appl 77, 10685–10699 (2018). https://doi.org/10.1007/s11042-017-4822-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-4822-7

Keywords

Navigation