A fast and efficient saliency detection model in video compressed-domain for human fixations prediction

Li, Yongjun; Li, Yunsong

doi:10.1007/s11042-016-4118-3

A fast and efficient saliency detection model in video compressed-domain for human fixations prediction

Published: 12 December 2016

Volume 76, pages 26273–26295, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yongjun Li^1,2,3 &
Yunsong Li^1,2

Abstract

Research and application of human fixations detection in video compressed-domain have gained an increasing attention in the latest years. However, both prediction accuracy and computational complexity still remain a challenge. This paper addresses the problem of compressed-domain video human fixations prediction based on saliency detection, and presents a fast and efficient algorithm based on Residual DCT Coefficients Norm (RDCN feature) and Operational Block Description Length (OBDL feature). These two features are directly extracted from the compressed bit-stream with partial decoding, and are normalized. After spatial and temporal filtering, the normalized salient maps are fused by the dynamic fusion coefficients with variation of quantization parameters. Then the fused salient map is worked by Gaussian model whose center is determined by the feature values. The proposed saliency detection model for human fixations prediction combines the accuracy of the pixel-domain saliency detections with the computational efficiency of their compressed-domain counterparts. The validation and comparison are made by several accuracy metrics on two ground truth datasets. Experimental results show that the proposed saliency detection model for human fixations prediction obtains superior performances over several state-of-the-art compressed-domain and pixel-domain algorithms on evaluation metrics. Computationally, our algorithm achieves a speed-up of over 10 times as compared to similar algorithms, which illustrates it appropriate for in-camera saliency estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Compressed-domain correlates of human fixations in dynamic scenes

Article 02 August 2015

Improving Saliency Models by Predicting Human Fixation Patches

Eye Fixation Assisted Detection of Video Salient Objects

References

Achanta R, Hemami S, Estrada F, Susstrunk S (2009) Frequency-tuned salient region detection. CVPR: 2009 I.E. Conference on Computer Vision and Pattern Recognition, Vols 1–4:1597–1604
Adelson EH, Bergen JR (1985) Spatiotemporal energy models for the perception of motion. J Opt Soc Am A 2(2):284–299
Article Google Scholar
Agarwal G, Anbu A, Sinha A (2003) A fast algorithm to find the region-of-interest in the compressed MPEG domain. 2003 International Conference on Multimedia and Expo, Vol Ii, Proceedings:133–136
Aminlou A, Hashemi MR, Gabbouj M, Zeng B, Fatemi O (2016) New R-D optimization criterion for fast mode decision algorithms in video coding and transrating. IEEE Trans Circuits Syst Video Technol 26(4):696–710. doi:10.1109/TCSVT.2015.2412811
Article Google Scholar
Borji A, Itti L (2013) State-of-the-art in visual attention modeling. IEEE Trans Pattern Anal Mach Intell 35(1):185–207. doi:10.1109/TPAMI.2012.89
Article Google Scholar
Borji A, Sihite DN, Itti L (2013) Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study. IEEE Trans Image Process 22(1):55–69. doi:10.1109/TIP.2012.2210727
Article MathSciNet MATH Google Scholar
Borji A, Cheng MM, Jiang HZ, Li J (2015) Salient object detection: a benchmark. IEEE Trans Image Process 24(12):5706–5722. doi:10.1109/TIP.2015.2487833
Article MathSciNet Google Scholar
Braun J, Sagi D (1990) Vision outside the focus of attention. Percept Psychophys 48(1):45–48
Article Google Scholar
Cao J, Zhang S (2014) Multiple comparison procedures. Jama-J Am Med Assoc 312(5):543–544
Article Google Scholar
Chamaret C, Chevet JC, Le Meur O (2010) Spatio-temporal combination of saliency maps and eye-tracking assessment of different strategies. IEEE Int Conf Image Process 2010:1077–1080. doi:10.1109/ICIP.2010.5651381
Google Scholar
Davis JW, Sharma V (2007) Background-subtraction in thermal imagery using contour saliency. Int J Comput Vis 71(2):161–181. doi:10.1007/s11263-006-4121-7
Article Google Scholar
Fang YM, Chen ZZ, Lin WS, Lin CW (2012) Saliency detection in the compressed domain for adaptive image retargeting. IEEE Trans Image Process 21(9):3888–3901. doi:10.1109/TIP.2012.2199126
Article MathSciNet MATH Google Scholar
Fang YM, Lin WS, Chen ZZ, Tsai CM, Lin CW (2014) A video saliency detection model in compressed domain. IEEE Trans Circuits Syst Video Technol 24(1):27–38. doi:10.1109/TCSVT.2013.2273613
Article Google Scholar
Garcia-Diaz A, Fdez-Vidal XR, Pardo XM, Dosil R (2012) Saliency from hierarchical adaptation through decorrelation and variance normalization. Image Vis Comput 30(1):51–64. doi:10.1016/j.imavis.2011.11.007
Article Google Scholar
Goferman S, Zelnik-Manor L, Tal A (2012) Context-aware saliency detection. IEEE Trans Pattern Anal Mach Intell 34(10):1915–1926. doi:10.1109/TPAMI.2011.272
Article Google Scholar
Hadizadeh H, Enriquez MJ, Bajic IV (2012) Eye-tracking database for a set of standard video sequences. IEEE Trans Image Process 21(2):898–903. doi:10.1109/TIP.2011.2165292
Article MathSciNet MATH Google Scholar
Harel J, Koch C, Perona P (2007) Graph-based visual saliency. Adv Neural Inf Process 19:545–552
Google Scholar
Hou XD, Zhang LQ (2007) Saliency detection: a spectral residual approach. 2007 I.E. Conference on Computer Vision and Pattern Recognition, Vols 1–8:2280–2287
Itti L (2014) Computational modeling of bottom-up and top-down visual attention. I-Perception 5(4):415
Google Scholar
Itti L, Baldi P (2009) Bayesian surprise attracts human attention. Vis Res 49(10):1295–1306. doi:10.1016/j.visres.2008.09.007
Article Google Scholar
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259. doi:10.1109/34.730558
Article Google Scholar
Khatoonabadi SH, Bajic IV, Shan YF (2014) Comparison of visual saliency models for compressed video. 2014 I.E. International Conference on Image Processing (ICIP):1081–1085
Khatoonabadi SH, Bajic IV, Shan YF (2015) Compressed-domain correlates of human fixations in dynamic scenes. Multimed Tools Appl 74(22):10057–10075. doi:10.1007/s11042-015-2802-3
Article Google Scholar
Khatoonabadi SH, Vasconcelos N, Bajić IV, Shan Y (2015) How many bits does it take for a stimulus to be salient? CVPR: 2015 I.E. Conference on Computer Vision and Pattern Recognition:5501–5510
Koch C, Ullman S (1985) Shifts in selective visual attention: towards the underlying neural circuitry. Hum Neurobiol 4(4):219–227
Google Scholar
Kreyszig E (1970) Introductory mathematical statistics: principles and methods. Wiley, New York
MATH Google Scholar
Lin J (1991) Divergence measures based on the Shannon entropy. IEEE Trans Inf Theory 37(1):145–151
Article MathSciNet MATH Google Scholar
Lin XY, Ma HJ, Luo L, Chen YW (2012) No-reference video quality assessment in the compressed domain. IEEE Trans Consum Electr 58(2):505–512
Article Google Scholar
Liu Z, Yan HB, Shen LQ, Wang YF, Zhang ZY (2009) A motion attention model based rate control algorithm for H.264/AVC. Proceedings of the 8th IEEE/ACIS International Conference on Computer and Information Science:568–573. doi:10.1109/ICIS.2009.165
Ma YF, Zhang HJ (2001) A new perceived motion based shot content representation. 2001 International Conference on Image Processing, Vol Iii, Proceedings:426–429
Ma YF, Zhang HJ (2002) A model of motion attention for video skimming. 2002 International Conference on Image Processing, Vol I, Proceedings:129–132
Markus MT, Groenen PJF (1998) An introduction to the bootstrap. Psychometrika 63(1):97–101
Google Scholar
Moulden B, Renshaw J, Mather G (1984) Two channels for flicker in the human visual system. Perception 13(4):387–400
Article Google Scholar
Muthuswamy K, Rajan D (2013) Salient motion detection in compressed domain. IEEE Signal Process Lett 20(10):996–999. doi:10.1109/LSP.2013.2277884
Article Google Scholar
Peters RJ, Iyer A, Itti L, Koch C (2005) Components of bottom-up gaze allocation in natural images. Vis Res 45(18):2397–2416. doi:10.1016/j.visres.2005.03.019
Article Google Scholar
Reinagel P, Zador AM (1999) Natural scene statistics at the centre of gaze. Netw Comput Neural Syst 10(4):341–350. doi:10.1088/0954-898x/10/4/304
Article MATH Google Scholar
Seo HJ, Milanfar P (2009) Static and space-time visual saliency detection by self-resemblance. J Vis 9(12):1–27. doi:10.1167/9.12.15
Article Google Scholar
Sullivan GJ, Ohm JR, Han WJ, Wiegand T (2012) Overview of the high efficiency video coding (HEVC) standard. IEEE Trans Circuits Syst Video Technol 22(12):1649–1668. doi:10.1109/TCSVT.2012.2221191
Article Google Scholar
Swets J (1996) Signal detection theory and ROC analysis in psychology and diagnostics: collected papers. Lawrence Erlbaum Associates 43(12):840–841
MATH Google Scholar
Tatler BW, Baddeley RJ, Gilchrist ID (2005) Visual correlates of fixation selection: effects of scale and time. Vis Res 45(5):643–659. doi:10.1016/j.visres.2004.09.017
Article Google Scholar
The Dynamic Images and Eye Movements (DIEM) project. http://thediemproject.wordpress.com
Treisman A, Gelade G (1980) A feature-integration theory of attention. Cogn Psychol 12(1):97–136
Article Google Scholar
Treisman A, Gormican S (1988) Feature analysis in early vision: evidence from search asymmetries. Psychol Rev 95(1):15–48
Article Google Scholar
Walther D, Koch C (2006) Modeling attention to salient proto-objects. Neural Netw 19(9):1395–1407. doi:10.1016/j.neunet.2006.10.001
Article MATH Google Scholar
Wiegand T, Sullivan GJ, Bjontegaard G, Luthra A (2003) Overview of the H.264/AVC video coding standard. IEEE Trans Circuits Syst Video Technol 13(7):560–576. doi:10.1109/TCSVT.2003.815165
Article Google Scholar
Wolfe JM, Butcher SJ, Lee C, Hyle M (2003) Changing your mind: on the contributions of top-down and bottom-up guidance in visual search for feature singletons. J Exp Psychol Hum 29(2):483–502. doi:10.1037/0096-1523.29.2.483
Article Google Scholar
Xie R, Yu S (2008) Region-of-interest-based video transcoding from MPEG-2 to H. 264 in the compressed domain. Opt Eng 47(9):097001. doi:10.1117/1.2976797
Article Google Scholar

Download references

Acknowledgments

This work was partially supported by the National Nature Science Foundation of China (Grant Nos. 61222101, 61301287, 61301291, 61502367)

Author information

Authors and Affiliations

State Key Lab. of Integrated Service Networks, School of Telecommunications Engineering, Xidian University, No.2, South Taibai Street, Hi-Tech Development Zone, Xi’an, China, 710071
Yongjun Li & Yunsong Li
Joint Laboratory of High Speed Multi-source Image Coding and Processing, School of Telecommunications Engineering, Xidian University, No.2, South Taibai Street, Hi-Tech Development Zone, Xi’an, China, 710071
Yongjun Li & Yunsong Li
School of Physics and Electronics, Henan University, No.1 Jinming Street,, Kaifeng, Henan, China, 475004
Yongjun Li

Authors

Yongjun Li
View author publications
You can also search for this author in PubMed Google Scholar
Yunsong Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yunsong Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Y., Li, Y. A fast and efficient saliency detection model in video compressed-domain for human fixations prediction. Multimed Tools Appl 76, 26273–26295 (2017). https://doi.org/10.1007/s11042-016-4118-3

Download citation

Received: 17 May 2016
Revised: 19 September 2016
Accepted: 01 November 2016
Published: 12 December 2016
Issue Date: December 2017
DOI: https://doi.org/10.1007/s11042-016-4118-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A fast and efficient saliency detection model in video compressed-domain for human fixations prediction

Abstract

Access this article

Similar content being viewed by others

Compressed-domain correlates of human fixations in dynamic scenes

Improving Saliency Models by Predicting Human Fixation Patches

Eye Fixation Assisted Detection of Video Salient Objects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A fast and efficient saliency detection model in video compressed-domain for human fixations prediction

Abstract

Access this article

Similar content being viewed by others

Compressed-domain correlates of human fixations in dynamic scenes

Improving Saliency Models by Predicting Human Fixation Patches

Eye Fixation Assisted Detection of Video Salient Objects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation