Skip to main content

Advertisement

Log in

A fast and efficient saliency detection model in video compressed-domain for human fixations prediction

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Research and application of human fixations detection in video compressed-domain have gained an increasing attention in the latest years. However, both prediction accuracy and computational complexity still remain a challenge. This paper addresses the problem of compressed-domain video human fixations prediction based on saliency detection, and presents a fast and efficient algorithm based on Residual DCT Coefficients Norm (RDCN feature) and Operational Block Description Length (OBDL feature). These two features are directly extracted from the compressed bit-stream with partial decoding, and are normalized. After spatial and temporal filtering, the normalized salient maps are fused by the dynamic fusion coefficients with variation of quantization parameters. Then the fused salient map is worked by Gaussian model whose center is determined by the feature values. The proposed saliency detection model for human fixations prediction combines the accuracy of the pixel-domain saliency detections with the computational efficiency of their compressed-domain counterparts. The validation and comparison are made by several accuracy metrics on two ground truth datasets. Experimental results show that the proposed saliency detection model for human fixations prediction obtains superior performances over several state-of-the-art compressed-domain and pixel-domain algorithms on evaluation metrics. Computationally, our algorithm achieves a speed-up of over 10 times as compared to similar algorithms, which illustrates it appropriate for in-camera saliency estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Achanta R, Hemami S, Estrada F, Susstrunk S (2009) Frequency-tuned salient region detection. CVPR: 2009 I.E. Conference on Computer Vision and Pattern Recognition, Vols 1–4:1597–1604

  2. Adelson EH, Bergen JR (1985) Spatiotemporal energy models for the perception of motion. J Opt Soc Am A 2(2):284–299

    Article  Google Scholar 

  3. Agarwal G, Anbu A, Sinha A (2003) A fast algorithm to find the region-of-interest in the compressed MPEG domain. 2003 International Conference on Multimedia and Expo, Vol Ii, Proceedings:133–136

  4. Aminlou A, Hashemi MR, Gabbouj M, Zeng B, Fatemi O (2016) New R-D optimization criterion for fast mode decision algorithms in video coding and transrating. IEEE Trans Circuits Syst Video Technol 26(4):696–710. doi:10.1109/TCSVT.2015.2412811

    Article  Google Scholar 

  5. Borji A, Itti L (2013) State-of-the-art in visual attention modeling. IEEE Trans Pattern Anal Mach Intell 35(1):185–207. doi:10.1109/TPAMI.2012.89

    Article  Google Scholar 

  6. Borji A, Sihite DN, Itti L (2013) Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study. IEEE Trans Image Process 22(1):55–69. doi:10.1109/TIP.2012.2210727

    Article  MathSciNet  MATH  Google Scholar 

  7. Borji A, Cheng MM, Jiang HZ, Li J (2015) Salient object detection: a benchmark. IEEE Trans Image Process 24(12):5706–5722. doi:10.1109/TIP.2015.2487833

    Article  MathSciNet  Google Scholar 

  8. Braun J, Sagi D (1990) Vision outside the focus of attention. Percept Psychophys 48(1):45–48

    Article  Google Scholar 

  9. Cao J, Zhang S (2014) Multiple comparison procedures. Jama-J Am Med Assoc 312(5):543–544

    Article  Google Scholar 

  10. Chamaret C, Chevet JC, Le Meur O (2010) Spatio-temporal combination of saliency maps and eye-tracking assessment of different strategies. IEEE Int Conf Image Process 2010:1077–1080. doi:10.1109/ICIP.2010.5651381

    Google Scholar 

  11. Davis JW, Sharma V (2007) Background-subtraction in thermal imagery using contour saliency. Int J Comput Vis 71(2):161–181. doi:10.1007/s11263-006-4121-7

    Article  Google Scholar 

  12. Fang YM, Chen ZZ, Lin WS, Lin CW (2012) Saliency detection in the compressed domain for adaptive image retargeting. IEEE Trans Image Process 21(9):3888–3901. doi:10.1109/TIP.2012.2199126

    Article  MathSciNet  MATH  Google Scholar 

  13. Fang YM, Lin WS, Chen ZZ, Tsai CM, Lin CW (2014) A video saliency detection model in compressed domain. IEEE Trans Circuits Syst Video Technol 24(1):27–38. doi:10.1109/TCSVT.2013.2273613

    Article  Google Scholar 

  14. Garcia-Diaz A, Fdez-Vidal XR, Pardo XM, Dosil R (2012) Saliency from hierarchical adaptation through decorrelation and variance normalization. Image Vis Comput 30(1):51–64. doi:10.1016/j.imavis.2011.11.007

    Article  Google Scholar 

  15. Goferman S, Zelnik-Manor L, Tal A (2012) Context-aware saliency detection. IEEE Trans Pattern Anal Mach Intell 34(10):1915–1926. doi:10.1109/TPAMI.2011.272

    Article  Google Scholar 

  16. Hadizadeh H, Enriquez MJ, Bajic IV (2012) Eye-tracking database for a set of standard video sequences. IEEE Trans Image Process 21(2):898–903. doi:10.1109/TIP.2011.2165292

    Article  MathSciNet  MATH  Google Scholar 

  17. Harel J, Koch C, Perona P (2007) Graph-based visual saliency. Adv Neural Inf Process 19:545–552

    Google Scholar 

  18. Hou XD, Zhang LQ (2007) Saliency detection: a spectral residual approach. 2007 I.E. Conference on Computer Vision and Pattern Recognition, Vols 1–8:2280–2287

  19. Itti L (2014) Computational modeling of bottom-up and top-down visual attention. I-Perception 5(4):415

    Google Scholar 

  20. Itti L, Baldi P (2009) Bayesian surprise attracts human attention. Vis Res 49(10):1295–1306. doi:10.1016/j.visres.2008.09.007

    Article  Google Scholar 

  21. Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259. doi:10.1109/34.730558

    Article  Google Scholar 

  22. Khatoonabadi SH, Bajic IV, Shan YF (2014) Comparison of visual saliency models for compressed video. 2014 I.E. International Conference on Image Processing (ICIP):1081–1085

  23. Khatoonabadi SH, Bajic IV, Shan YF (2015) Compressed-domain correlates of human fixations in dynamic scenes. Multimed Tools Appl 74(22):10057–10075. doi:10.1007/s11042-015-2802-3

    Article  Google Scholar 

  24. Khatoonabadi SH, Vasconcelos N, Bajić IV, Shan Y (2015) How many bits does it take for a stimulus to be salient? CVPR: 2015 I.E. Conference on Computer Vision and Pattern Recognition:5501–5510

  25. Koch C, Ullman S (1985) Shifts in selective visual attention: towards the underlying neural circuitry. Hum Neurobiol 4(4):219–227

    Google Scholar 

  26. Kreyszig E (1970) Introductory mathematical statistics: principles and methods. Wiley, New York

    MATH  Google Scholar 

  27. Lin J (1991) Divergence measures based on the Shannon entropy. IEEE Trans Inf Theory 37(1):145–151

    Article  MathSciNet  MATH  Google Scholar 

  28. Lin XY, Ma HJ, Luo L, Chen YW (2012) No-reference video quality assessment in the compressed domain. IEEE Trans Consum Electr 58(2):505–512

    Article  Google Scholar 

  29. Liu Z, Yan HB, Shen LQ, Wang YF, Zhang ZY (2009) A motion attention model based rate control algorithm for H.264/AVC. Proceedings of the 8th IEEE/ACIS International Conference on Computer and Information Science:568–573. doi:10.1109/ICIS.2009.165

  30. Ma YF, Zhang HJ (2001) A new perceived motion based shot content representation. 2001 International Conference on Image Processing, Vol Iii, Proceedings:426–429

  31. Ma YF, Zhang HJ (2002) A model of motion attention for video skimming. 2002 International Conference on Image Processing, Vol I, Proceedings:129–132

  32. Markus MT, Groenen PJF (1998) An introduction to the bootstrap. Psychometrika 63(1):97–101

    Google Scholar 

  33. Moulden B, Renshaw J, Mather G (1984) Two channels for flicker in the human visual system. Perception 13(4):387–400

    Article  Google Scholar 

  34. Muthuswamy K, Rajan D (2013) Salient motion detection in compressed domain. IEEE Signal Process Lett 20(10):996–999. doi:10.1109/LSP.2013.2277884

    Article  Google Scholar 

  35. Peters RJ, Iyer A, Itti L, Koch C (2005) Components of bottom-up gaze allocation in natural images. Vis Res 45(18):2397–2416. doi:10.1016/j.visres.2005.03.019

    Article  Google Scholar 

  36. Reinagel P, Zador AM (1999) Natural scene statistics at the centre of gaze. Netw Comput Neural Syst 10(4):341–350. doi:10.1088/0954-898x/10/4/304

    Article  MATH  Google Scholar 

  37. Seo HJ, Milanfar P (2009) Static and space-time visual saliency detection by self-resemblance. J Vis 9(12):1–27. doi:10.1167/9.12.15

    Article  Google Scholar 

  38. Sullivan GJ, Ohm JR, Han WJ, Wiegand T (2012) Overview of the high efficiency video coding (HEVC) standard. IEEE Trans Circuits Syst Video Technol 22(12):1649–1668. doi:10.1109/TCSVT.2012.2221191

    Article  Google Scholar 

  39. Swets J (1996) Signal detection theory and ROC analysis in psychology and diagnostics: collected papers. Lawrence Erlbaum Associates 43(12):840–841

    MATH  Google Scholar 

  40. Tatler BW, Baddeley RJ, Gilchrist ID (2005) Visual correlates of fixation selection: effects of scale and time. Vis Res 45(5):643–659. doi:10.1016/j.visres.2004.09.017

    Article  Google Scholar 

  41. The Dynamic Images and Eye Movements (DIEM) project. http://thediemproject.wordpress.com

  42. Treisman A, Gelade G (1980) A feature-integration theory of attention. Cogn Psychol 12(1):97–136

    Article  Google Scholar 

  43. Treisman A, Gormican S (1988) Feature analysis in early vision: evidence from search asymmetries. Psychol Rev 95(1):15–48

    Article  Google Scholar 

  44. Walther D, Koch C (2006) Modeling attention to salient proto-objects. Neural Netw 19(9):1395–1407. doi:10.1016/j.neunet.2006.10.001

    Article  MATH  Google Scholar 

  45. Wiegand T, Sullivan GJ, Bjontegaard G, Luthra A (2003) Overview of the H.264/AVC video coding standard. IEEE Trans Circuits Syst Video Technol 13(7):560–576. doi:10.1109/TCSVT.2003.815165

    Article  Google Scholar 

  46. Wolfe JM, Butcher SJ, Lee C, Hyle M (2003) Changing your mind: on the contributions of top-down and bottom-up guidance in visual search for feature singletons. J Exp Psychol Hum 29(2):483–502. doi:10.1037/0096-1523.29.2.483

    Article  Google Scholar 

  47. Xie R, Yu S (2008) Region-of-interest-based video transcoding from MPEG-2 to H. 264 in the compressed domain. Opt Eng 47(9):097001. doi:10.1117/1.2976797

    Article  Google Scholar 

Download references

Acknowledgments

This work was partially supported by the National Nature Science Foundation of China (Grant Nos. 61222101, 61301287, 61301291, 61502367)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yunsong Li.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Li, Y. A fast and efficient saliency detection model in video compressed-domain for human fixations prediction. Multimed Tools Appl 76, 26273–26295 (2017). https://doi.org/10.1007/s11042-016-4118-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-4118-3

Keywords

Navigation