Visual Saliency Detection Using Group Lasso Regularization in Videos of Natural Scenes

Souly, Nasim; Shah, Mubarak

doi:10.1007/s11263-015-0853-6

Visual Saliency Detection Using Group Lasso Regularization in Videos of Natural Scenes

Published: 15 August 2015

Volume 117, pages 93–110, (2016)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

1387 Accesses
24 Citations
Explore all metrics

Abstract

Visual saliency is the ability of a vision system to promptly select the most relevant data in the scene and reduce the amount of visual data that needs to be processed. Thus, its applications for complex tasks such as object detection, object recognition and video compression have attained interest in computer vision studies. In this paper, we introduce a novel unsupervised method for detecting visual saliency in videos of natural scenes. For this, we divide a video into non-overlapping cuboids and create a matrix whose columns correspond to intensity values of these cuboids. Simultaneously, we segment the video using a hierarchical segmentation method and obtain super-voxels. A dictionary learned from the feature data matrix of the video is subsequently used to represent the video as coefficients of atoms. Then, these coefficients are decomposed into salient and non-salient parts. We propose to use group lasso regularization to find the sparse representation of a video, which benefits from grouping information provided by super-voxels and extracted features from the cuboids. We find saliency regions by decomposing the feature matrix of a video into low-rank and sparse matrices by using robust principal component analysis matrix recovery method. The applicability of our method is tested on four video data sets of natural scenes. Our experiments provide promising results in terms of predicting eye movement using standard evaluation methods. In addition, we show our video saliency can be used to improve the performance of human action recognition on a standard dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

Deep learning for video object segmentation: a review

Article Open access 08 April 2022

Video shot-boundary detection: issues, challenges and solutions

Article Open access 30 March 2024

References

Bach, F. R. (2008). Consistency of the group lasso and multiple kernel learning. The Journal of Machine Learning Research, 9, 1179–1225.
MathSciNet MATH Google Scholar
Borji, A., & Itti, L. (2013). State-of-the-art in visual attention modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 185.
Article Google Scholar
Borji, A., Sihite, D. N., & Itti, L. (2011). Computational modeling of top-down visual attention in interactive environments. In British Machine Vision Conference (pp. 1–12).
Borji, A., Sihite, D. N., & Itti, L. (2013). What stands out in a scene? A study of human explicit saliency judgment. Vision Research, 91, 62–77.
Article Google Scholar
Bruce, N., & Tsotsos, J. (2005). Saliency based on information maximization. In Advances in Neural Information Processing Systems (pp. 155–162).
Bruce, N. D., & Tsotsos, J. K. (2009). Saliency, attention, and visual search: An information theoretic approach. Journal of Vision, 9(3), 5.
Article Google Scholar
Dorr, M., Martinetz, T., Gegenfurtner, K. R., & Barth, E. (2010). Variability of eye movements when viewing dynamic natural scenes. Journal of Vision, 10(10), 28.
Article Google Scholar
Elad, M., & Aharon, M. (2006). Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 15, 3736.
Article MathSciNet Google Scholar
Frintrop, S., Rome, E., & Christensen, H. I. (2010). Computational visual attention systems and their cognitive foundations: A survey. ACM Transactions on Applied Perception (TAP), 7(1), 6.
Google Scholar
Gao, D., Han, S., & Vasconcelos, N. (2009). Discriminant saliency, the detection of suspicious coincidences, and applications to visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(6), 989–1005.
Article Google Scholar
Gao, D., & Vasconcelos, N. (2004). Discriminant saliency for visual recognition from cluttered scenes. In Advances in Neural Information Processing Systems (pp. 481–488).
Gao, D., & Vasconcelos, N. (2009). Decision-theoretic saliency: Computational principles, biological plausibility, and implications for neurophysiology and psychophysics. Neural Computation, 21(1), 239–271.
Article MATH Google Scholar
Grundmann, M., Kwatra, V., Han, M., & Essa, I. (2010). Efficient hierarchical graph-based video segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 2141–2148).
Guo, C., Ma, Q., & Zhang, L. (2008). Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 1–8).
Itti, L., & Baldi, P. (2005). A principled approach to detecting surprising events in video. In IEEE Conference on Computer Vision and Pattern Recognition.
Itti, L., & Baldi, P. (2009). Bayesian surprise attracts human attention. Vision Research, 49, 1295.
Article Google Scholar
Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254–1259.
Article Google Scholar
Judd, T., Ehinger, K., Durand, F., & Torralba, A. (2009). Learning to predict where humans look. In IEEE International Conference on Computer Vision (pp. 2106–2113).
Kienzle, W., Schölkopf, B., Wichmann, F. A., & Franz, M. O. (2007a). How to find interesting locations in video: a spatiotemporal interest point detector learned from human eye movements. In Pattern Recognition (pp. 405–414). Springer.
Kienzle, W., Wichmann, F., Schölkopf, B., & Franz, M. (2007b). A nonparametric approach to bottom-up visual saliency. In Advances in Neural Information Processing Systems.
Koch, K., McLean, J., Segev, R., Freed, M. A., Berry, M. J, I. I., & Balasubramanian, V. (2006). How much the eye tells the brain. Current Biology, 16(14), 1428–1434.
Article Google Scholar
Lan, T., Wang, Y., & Mori, G. (2011). Discriminative figure-centric models for joint action localization and recognition. In International Conference on Computer Vision (ICCV).
Laptev, I., Marszalek, M., Schmid, C., & Rozenfeld, B. (2008). Learning realistic human actions from movies. In IEEE Conference on Computer Vision and Pattern Recognition, 2008 (CVPR 2008).
Lin, Z., Chen, M., & Ma, Y. (2010). The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. arXiv preprint arXiv:1009.5055.
Liu, J., Ji, S., & Ye, J. (2009). SLEP: Sparse Learning with Efficient Projections. Tempe: Arizona State University.
Google Scholar
Liu, T., Yuan, Z., Sun, J., Wang, J., Zheng, N., Tang, X., et al. (2011). Learning to detect a salient object. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(2), 353–367.
Article Google Scholar
Ma, Y. F., Hua, X. S., Lu, L., & Zhang, H. J. (2005). A generic framework of user attention model and its application in video summarization. IEEE Transactions on Multimedia, 7(5), 907–919.
Article Google Scholar
Ma, Y. F., Lu, L., Zhang, H. J., & Li, M. (2002). A user attention model for video summarization. In ACM international conference on Multimedia, MULTIMEDIA ’02.
Mahadevan, V., & Vasconcelos, N. (2010). Spatiotemporal saliency in dynamic scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(1), 171–177.
Article Google Scholar
Mairal, J. (2012). Spams: A sparse modeling software [online], available: http://spams-devel.gforge.inria.fr.
Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2010). Online learning for matrix factorization and sparse coding. The Journal of Machine Learning Research, 11, 19–60.
MathSciNet MATH Google Scholar
Mallat, S. (2009). A wavelet tour of signal processing. New York: Academic Press.
MATH Google Scholar
Marat, S., Guironnet, M., Pellerin, D., et al. (2007). Video summarization using a visual attention model. In European Signal Processing Conference.
Marat, S., Phuoc, T. H., Granjon, L., Guyader, N., Pellerin, D., & Guérin-Dugué, A. (2009). Modelling spatio-temporal saliency to predict gaze direction for short videos. International Journal of Computer Vision, 82(3), 231–243.
Article Google Scholar
Marszałek, M., Laptev, I., & Schmid, C. (2009). Actions in context. In IEEE Conference on Computer Vision & Pattern Recognition.
Mathe, S., & Sminchisescu, C. (2012a). Actions in the eye: Dynamic gaze datasets and learnt saliency models for visual recognition. Technical report, Institute of Mathematics of the Romanian Academy and University of Bonn.
Mathe, S., & Sminchisescu, C. (2012b). Dynamic eye movement datasets and learnt saliency models for visual action recognition. In IEEE European Conference on Computer Vision.
Meier, L., Van De Geer, S., & Bühlmann, P. (2008). The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(1), 53–71.
Article MathSciNet MATH Google Scholar
Navalpakkam, V., & Itti, L. (2006). An integrated model of top-down and bottom-up attention for optimizing detection speed. In IEEE Conference on Computer Vision and Pattern Recognition.
Olshausen, B. A., & Field, D. J. (1997). Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision Research, 37(23), 3311–3325.
Article Google Scholar
Olshausen, B. A., & Field, D. J. (2004). Sparse coding of sensory inputs. Current Opinion in Neurobiology, 14(4), 481–487.
Article Google Scholar
Poirier, F. J., Gosselin, F., & Arguin, M. (2008). Perceptive fields of saliency. Journal of Vision, 8(15), 14.
Article Google Scholar
Qin, Z., Scheinberg, K., & Goldfarb, D. (2010). Efficient block-coordinate descent algorithms for the group lasso. Mathematical Programming Computation, 5, 143.
Article MathSciNet Google Scholar
Rensink, R. A., O’Regan, J. K., & Clark, J. J. (1997). To see or not to see: The need for attention to perceive changes in scenes. Psychological Science, 8(5), 368–373.
Article Google Scholar
Rodriguez, M. D., Ahmed, J., & Shah, M. (2008). Action mach: a spatio-temporal maximum average correlation height filter for action recognition. In IEEE International Conference on Computer Vision and Pattern Recognition.
Roth, V., & Fischer, B. (2008). The group-lasso for generalized linear models: uniqueness of solutions and efficient algorithms. In International Conference on Machine Learning (Vol. 104).
Rubinstein, R., Bruckstein, A. M., & Elad, M. (2010a). Dictionaries for sparse representation modeling. Proceedings of the IEEE.
Rubinstein, R., Zibulevsky, M., & Elad, M. (2010b). Double sparsity: Learning sparse dictionaries for sparse signal approximation. IEEE Transactions on Signal Processing, 58(3), 1553–1564.
Rudoy, D., Goldman, D. B., Shechtman, E., & Zelnik-Manor, L. (2013). Learning video saliency from human gaze using candidate selection. In 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1147–1154).
Seo, H. J., & Milanfar, P. (2009a). Nonparametric bottom-up saliency detection by self-resemblance. In Computer Vision and Pattern Recognition Workshops (pp. 45–52).
Seo, H. J., & Milanfar, P. (2009b). Static and space-time visual saliency detection by self-resemblance. Journal of Vision, 9(12), 15.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, 58, 267.
MathSciNet MATH Google Scholar
Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12(1), 97–136.
Article Google Scholar
Triesch, J., Ballard, D. H., Hayhoe, M. M., & Sullivan, B. T. (2003). What you see is what you need. Journal of Vision, 3, 9.
Article Google Scholar
Ungerleider, S. K., & Leslie, G. (2000). Mechanisms of visual attention in the human cortex. Annual Review of Neuroscience, 23(1), 315–341.
Article Google Scholar
Vig, E., Dorr, M., Martinetz, T., & Barth, E. (2011). Eye movements show optimal average anticipation with natural dynamic scenes. Cognitive Computation, 3(1), 79–88.
Article Google Scholar
Vig, E., Dorr, M., Martinetz, T., & Barth, E. (2012). Intrinsic dimensionality predicts the saliency of natural dynamic scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(6), 1080–1091.
Article Google Scholar
Wang, H., Klaser, A., Schmid, C., & Liu, C.-L. (2011a). Action recognition by dense trajectories. In 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Wang, J., Wang, Y., & Zhang, Z. (2011b). Visual saliency based aerial video summarization by online scene classification. In International Conference on Image and Graphics (pp. 777–782).
Wright, J., Ganesh, A., Rao, S., Peng, Y., & Ma, Y. (2009). Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization. In Advances in Neural Information Processing Systems (pp. 2080–2088).
Xu, C., & Corso, J. J. (2012). Evaluation of super-voxel methods for early video processing. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 1202–1209).
Yan, J., Zhu, M., Liu, H., & Liu, Y. (2010). Visual saliency detection via sparsity pursuit. IEEE Signal Processing Letters, 17(8), 739–742.
Article Google Scholar
Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68, 49.
Article MathSciNet MATH Google Scholar
Zhai, Y., & Shah, M. (2006). Visual attention detection in video sequences using spatiotemporal cues. In ACM international conference on Multimedia (pp. 815–824).
Zhang, L., Tong, M. H., & Cottrell, G. W. (2009). Sunday: Saliency using natural statistics for dynamic analysis of scenes. In Annual Cognitive Science Conference (pp. 2944–2949).
Zhang, L., Tong, M. H., Marks, T. K., Shan, H., & Cottrell, G. W. (2008). Sun: A bayesian framework for saliency using natural statistics. Journal of Vision, 8(7), 32.
Article Google Scholar
Zhong, S.-h., Liu, Y., Ren, F., Zhang, J., & Ren, T. (2013). Video saliency detection via dynamic consistent spatio-temporal attention modelling. In AAAI.
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320.
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

This work was supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number D11PC20066. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoI/NBC, or the U.S. Government.

Author information

Authors and Affiliations

The Center for Research in Computer Vision, 4000 Central Florida Blvd., Orlando, Florida, 32816, USA
Nasim Souly & Mubarak Shah

Authors

Nasim Souly
View author publications
You can also search for this author in PubMed Google Scholar
Mubarak Shah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nasim Souly.

Additional information

Communicated by Jakob Verbeek.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Souly, N., Shah, M. Visual Saliency Detection Using Group Lasso Regularization in Videos of Natural Scenes. Int J Comput Vis 117, 93–110 (2016). https://doi.org/10.1007/s11263-015-0853-6

Download citation

Received: 06 August 2014
Accepted: 31 July 2015
Published: 15 August 2015
Issue Date: March 2016
DOI: https://doi.org/10.1007/s11263-015-0853-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visual Saliency Detection Using Group Lasso Regularization in Videos of Natural Scenes

Abstract

Access this article

Similar content being viewed by others

Attention mechanisms in computer vision: A survey

Deep learning for video object segmentation: a review

Video shot-boundary detection: issues, challenges and solutions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Visual Saliency Detection Using Group Lasso Regularization in Videos of Natural Scenes

Abstract

Access this article

Similar content being viewed by others

Attention mechanisms in computer vision: A survey

Deep learning for video object segmentation: a review

Video shot-boundary detection: issues, challenges and solutions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation