Abstract
Research and application of human fixations detection in video compressed-domain have gained an increasing attention in the latest years. However, both prediction accuracy and computational complexity still remain a challenge. This paper addresses the problem of compressed-domain video human fixations prediction based on saliency detection, and presents a fast and efficient algorithm based on Residual DCT Coefficients Norm (RDCN feature) and Operational Block Description Length (OBDL feature). These two features are directly extracted from the compressed bit-stream with partial decoding, and are normalized. After spatial and temporal filtering, the normalized salient maps are fused by the dynamic fusion coefficients with variation of quantization parameters. Then the fused salient map is worked by Gaussian model whose center is determined by the feature values. The proposed saliency detection model for human fixations prediction combines the accuracy of the pixel-domain saliency detections with the computational efficiency of their compressed-domain counterparts. The validation and comparison are made by several accuracy metrics on two ground truth datasets. Experimental results show that the proposed saliency detection model for human fixations prediction obtains superior performances over several state-of-the-art compressed-domain and pixel-domain algorithms on evaluation metrics. Computationally, our algorithm achieves a speed-up of over 10 times as compared to similar algorithms, which illustrates it appropriate for in-camera saliency estimation.
Similar content being viewed by others
References
Achanta R, Hemami S, Estrada F, Susstrunk S (2009) Frequency-tuned salient region detection. CVPR: 2009 I.E. Conference on Computer Vision and Pattern Recognition, Vols 1–4:1597–1604
Adelson EH, Bergen JR (1985) Spatiotemporal energy models for the perception of motion. J Opt Soc Am A 2(2):284–299
Agarwal G, Anbu A, Sinha A (2003) A fast algorithm to find the region-of-interest in the compressed MPEG domain. 2003 International Conference on Multimedia and Expo, Vol Ii, Proceedings:133–136
Aminlou A, Hashemi MR, Gabbouj M, Zeng B, Fatemi O (2016) New R-D optimization criterion for fast mode decision algorithms in video coding and transrating. IEEE Trans Circuits Syst Video Technol 26(4):696–710. doi:10.1109/TCSVT.2015.2412811
Borji A, Itti L (2013) State-of-the-art in visual attention modeling. IEEE Trans Pattern Anal Mach Intell 35(1):185–207. doi:10.1109/TPAMI.2012.89
Borji A, Sihite DN, Itti L (2013) Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study. IEEE Trans Image Process 22(1):55–69. doi:10.1109/TIP.2012.2210727
Borji A, Cheng MM, Jiang HZ, Li J (2015) Salient object detection: a benchmark. IEEE Trans Image Process 24(12):5706–5722. doi:10.1109/TIP.2015.2487833
Braun J, Sagi D (1990) Vision outside the focus of attention. Percept Psychophys 48(1):45–48
Cao J, Zhang S (2014) Multiple comparison procedures. Jama-J Am Med Assoc 312(5):543–544
Chamaret C, Chevet JC, Le Meur O (2010) Spatio-temporal combination of saliency maps and eye-tracking assessment of different strategies. IEEE Int Conf Image Process 2010:1077–1080. doi:10.1109/ICIP.2010.5651381
Davis JW, Sharma V (2007) Background-subtraction in thermal imagery using contour saliency. Int J Comput Vis 71(2):161–181. doi:10.1007/s11263-006-4121-7
Fang YM, Chen ZZ, Lin WS, Lin CW (2012) Saliency detection in the compressed domain for adaptive image retargeting. IEEE Trans Image Process 21(9):3888–3901. doi:10.1109/TIP.2012.2199126
Fang YM, Lin WS, Chen ZZ, Tsai CM, Lin CW (2014) A video saliency detection model in compressed domain. IEEE Trans Circuits Syst Video Technol 24(1):27–38. doi:10.1109/TCSVT.2013.2273613
Garcia-Diaz A, Fdez-Vidal XR, Pardo XM, Dosil R (2012) Saliency from hierarchical adaptation through decorrelation and variance normalization. Image Vis Comput 30(1):51–64. doi:10.1016/j.imavis.2011.11.007
Goferman S, Zelnik-Manor L, Tal A (2012) Context-aware saliency detection. IEEE Trans Pattern Anal Mach Intell 34(10):1915–1926. doi:10.1109/TPAMI.2011.272
Hadizadeh H, Enriquez MJ, Bajic IV (2012) Eye-tracking database for a set of standard video sequences. IEEE Trans Image Process 21(2):898–903. doi:10.1109/TIP.2011.2165292
Harel J, Koch C, Perona P (2007) Graph-based visual saliency. Adv Neural Inf Process 19:545–552
Hou XD, Zhang LQ (2007) Saliency detection: a spectral residual approach. 2007 I.E. Conference on Computer Vision and Pattern Recognition, Vols 1–8:2280–2287
Itti L (2014) Computational modeling of bottom-up and top-down visual attention. I-Perception 5(4):415
Itti L, Baldi P (2009) Bayesian surprise attracts human attention. Vis Res 49(10):1295–1306. doi:10.1016/j.visres.2008.09.007
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259. doi:10.1109/34.730558
Khatoonabadi SH, Bajic IV, Shan YF (2014) Comparison of visual saliency models for compressed video. 2014 I.E. International Conference on Image Processing (ICIP):1081–1085
Khatoonabadi SH, Bajic IV, Shan YF (2015) Compressed-domain correlates of human fixations in dynamic scenes. Multimed Tools Appl 74(22):10057–10075. doi:10.1007/s11042-015-2802-3
Khatoonabadi SH, Vasconcelos N, Bajić IV, Shan Y (2015) How many bits does it take for a stimulus to be salient? CVPR: 2015 I.E. Conference on Computer Vision and Pattern Recognition:5501–5510
Koch C, Ullman S (1985) Shifts in selective visual attention: towards the underlying neural circuitry. Hum Neurobiol 4(4):219–227
Kreyszig E (1970) Introductory mathematical statistics: principles and methods. Wiley, New York
Lin J (1991) Divergence measures based on the Shannon entropy. IEEE Trans Inf Theory 37(1):145–151
Lin XY, Ma HJ, Luo L, Chen YW (2012) No-reference video quality assessment in the compressed domain. IEEE Trans Consum Electr 58(2):505–512
Liu Z, Yan HB, Shen LQ, Wang YF, Zhang ZY (2009) A motion attention model based rate control algorithm for H.264/AVC. Proceedings of the 8th IEEE/ACIS International Conference on Computer and Information Science:568–573. doi:10.1109/ICIS.2009.165
Ma YF, Zhang HJ (2001) A new perceived motion based shot content representation. 2001 International Conference on Image Processing, Vol Iii, Proceedings:426–429
Ma YF, Zhang HJ (2002) A model of motion attention for video skimming. 2002 International Conference on Image Processing, Vol I, Proceedings:129–132
Markus MT, Groenen PJF (1998) An introduction to the bootstrap. Psychometrika 63(1):97–101
Moulden B, Renshaw J, Mather G (1984) Two channels for flicker in the human visual system. Perception 13(4):387–400
Muthuswamy K, Rajan D (2013) Salient motion detection in compressed domain. IEEE Signal Process Lett 20(10):996–999. doi:10.1109/LSP.2013.2277884
Peters RJ, Iyer A, Itti L, Koch C (2005) Components of bottom-up gaze allocation in natural images. Vis Res 45(18):2397–2416. doi:10.1016/j.visres.2005.03.019
Reinagel P, Zador AM (1999) Natural scene statistics at the centre of gaze. Netw Comput Neural Syst 10(4):341–350. doi:10.1088/0954-898x/10/4/304
Seo HJ, Milanfar P (2009) Static and space-time visual saliency detection by self-resemblance. J Vis 9(12):1–27. doi:10.1167/9.12.15
Sullivan GJ, Ohm JR, Han WJ, Wiegand T (2012) Overview of the high efficiency video coding (HEVC) standard. IEEE Trans Circuits Syst Video Technol 22(12):1649–1668. doi:10.1109/TCSVT.2012.2221191
Swets J (1996) Signal detection theory and ROC analysis in psychology and diagnostics: collected papers. Lawrence Erlbaum Associates 43(12):840–841
Tatler BW, Baddeley RJ, Gilchrist ID (2005) Visual correlates of fixation selection: effects of scale and time. Vis Res 45(5):643–659. doi:10.1016/j.visres.2004.09.017
The Dynamic Images and Eye Movements (DIEM) project. http://thediemproject.wordpress.com
Treisman A, Gelade G (1980) A feature-integration theory of attention. Cogn Psychol 12(1):97–136
Treisman A, Gormican S (1988) Feature analysis in early vision: evidence from search asymmetries. Psychol Rev 95(1):15–48
Walther D, Koch C (2006) Modeling attention to salient proto-objects. Neural Netw 19(9):1395–1407. doi:10.1016/j.neunet.2006.10.001
Wiegand T, Sullivan GJ, Bjontegaard G, Luthra A (2003) Overview of the H.264/AVC video coding standard. IEEE Trans Circuits Syst Video Technol 13(7):560–576. doi:10.1109/TCSVT.2003.815165
Wolfe JM, Butcher SJ, Lee C, Hyle M (2003) Changing your mind: on the contributions of top-down and bottom-up guidance in visual search for feature singletons. J Exp Psychol Hum 29(2):483–502. doi:10.1037/0096-1523.29.2.483
Xie R, Yu S (2008) Region-of-interest-based video transcoding from MPEG-2 to H. 264 in the compressed domain. Opt Eng 47(9):097001. doi:10.1117/1.2976797
Acknowledgments
This work was partially supported by the National Nature Science Foundation of China (Grant Nos. 61222101, 61301287, 61301291, 61502367)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, Y., Li, Y. A fast and efficient saliency detection model in video compressed-domain for human fixations prediction. Multimed Tools Appl 76, 26273–26295 (2017). https://doi.org/10.1007/s11042-016-4118-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-4118-3