Abstract
In this paper, we present a probabilistic multi-task learning approach for visual saliency estimation in video. In our approach, the problem of visual saliency estimation is modeled by simultaneously considering the stimulus-driven and task-related factors in a probabilistic framework. In this framework, a stimulus-driven component simulates the low-level processes in human vision system using multi-scale wavelet decomposition and unbiased feature competition; while a task-related component simulates the high-level processes to bias the competition of the input features. Different from existing approaches, we propose a multi-task learning algorithm to learn the task-related “stimulus-saliency” mapping functions for each scene. The algorithm also learns various fusion strategies, which are used to integrate the stimulus-driven and task-related components to obtain the visual saliency. Extensive experiments were carried out on two public eye-fixation datasets and one regional saliency dataset. Experimental results show that our approach outperforms eight state-of-the-art approaches remarkably.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Argyriou, A., Evgeniou, T., & Pontil, M. (2007). Multi-task feature learning. In Advances in neural information processing systems (pp. 41–48).
Bruce, N. D., & Tsotsos, J. K. (2006). Saliency based on information maximization. In Advances in neural information processing systems (pp. 155–162).
Cerf, M., Harel, J., Einhauser, W., & Koch, C. (2008). Predicting human gaze using low-level saliency combined with face detection. In Advances in neural information processing systems (pp. 241–248).
Cheung, C. H., & Po, L. M. (2002). A novel cross-diamond search algorithm for fast block motion estimation. IEEE Transactions on Circuits and Systems for Video Technology, 12(12), 1168–1177.
Chun, M. M. (2005). Contextual guidance of visual attention. In Neurobiology of attention (pp. 246–250).
Evgeniou, T., Micchelli, C. A., & Pontil, M. (2005). Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6, 615–637.
Frith, C. (2005). The top in top-down attention. In Neurobiology of attention (pp. 105–108).
Guo, C., Ma, Q., & Zhang, L. (2008). Spatio-temporal saliency detection using phase spectrum of quaternion Fourier transform. In IEEE conference on computer vision and pattern recognition.
Harel, J., Koch, C., & Perona, P. (2007). Graph-based visual saliency. In Advances in neural information processing systems (pp. 545–552).
Henderson, J. M. (2003). Human gaze control during real-world scene perception. Trends in Cognitive Sciences, 7(11), 498–504.
Hou, X., & Zhang, L. (2007). Saliency detection: a spectral residual approach. In IEEE conference on computer vision and pattern recognition.
Hu, Y., Rajan, D., & Chia, L. T. (2005a). Adaptive local context suppression of multiple cues for salient visual attention detection. In IEEE international conference on multimedia and expo.
Hu, Y., Rajan, D., & Chia, L. T. (2005b). Robust subspace analysis for detecting visual attention regions in images. In ACM international conference on multimedia (pp. 716–724).
Itti, L. (2000). Models of bottom-up and top-down visual attention. Ph.D. thesis, California Institute of Technology.
Itti, L. (2008). Crcns data sharing: eye movements during free-viewing of natural videos. In Collaborative research in computational neuroscience annual meeting.
Itti, L., & Baldi, P. (2005). A principled approach to detecting surprising events in video. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 631–637).
Itti, L., & Koch, C. (2001a). Computational modeling of visual attention. Nature Review Neuroscience, 2(3), 194–203.
Itti, L., & Koch, C. (2001b). Feature combination strategies for saliency-based visual attention systems. Journal of Electronic Imaging, 10(1), 161–169.
Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254–1259.
Itti, L., Rees, G., & Tsotsos, J. (2005). Neurobiology of attention. San Diego: Elsevier.
Jacob, L., Bach, F., & Vert, J. P. (2008). Clustered multi-task learning: a convex formulation. In Advances in neural information processing systems (pp. 745–752).
Kienzle, W., Wichmann, A. F., Scholkopf, B., & Franz, M. O. (2007a). A nonparametric approach to bottom-up visual saliency. In Advances in neural information processing systems (pp. 689–696).
Kienzle, W., Scholkopf, B., Wichmann, F. A., & Franz, M. O. (2007b). How to find interesting locations in video: a spatiotemporal interest point detector learned from human eye movements. In 29th DAGM symposium (pp. 405–414).
Li, J., Tian, Y., Huang, T., & Gao, W. (2009). A dataset and evaluation methodology for visual saliency in video. In IEEE international conference on multimedia and expo (pp. 442–445).
Liu, H., Jiang, S., Huang, Q., Xu, C., & Gao, W. (2007a). Region-based visual attention analysis with its application in image browsing on small displays. In ACM international conference on multimedia (pp. 305–308).
Liu, T., Sun, J., Zheng, N. N., Tang, X., & Shum, H. Y. (2007b). Learning to detect a salient object. In IEEE conference on computer vision and pattern recognition.
Liu, T., Zheng, N., Ding, W., & Yuan, Z. (2008). Video attention: Learning to detect a salient object sequence. IEEE international conference on pattern recognition.
Ma, Y. F., Hua, X. S., Lu, L., & Zhang, H. J. (2005). A generic framework of user attention model and its application in video summarization. IEEE Transactions on Multimedia, 7(5), 907–919.
Marat, S., Phuoc, T. H., Granjon, L., Guyader, N., Pellerin, D., & Guärin-Duguä, A. (2009). Modelling spatio-temporal saliency to predict gaze direction for short videos. International Journal of Computer Vision, 82(3), 231–243.
Mozer, M. C., Shettel, M., & Vecera, S. (2005). Top-down control of visual attention—a rational account. In Neural information processing systems.
Navalpakkam, V., & Itti, L. (2007). Search goal tunes visual features optimally. Neuron, 53, 605–617.
Peters, R. J., & Itti, L. (2007a). Beyond bottom-up: incorporating task-dependent influences into a computational model of spatial attention. In IEEE conference on computer vision and pattern recognition.
Peters, R. J., & Itti, L. (2007b). Congruence between model and human attention reveals unique signatures of critical visual events. In Advances in neural information processing systems (pp. 1145–1152).
Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12(1), 97–136.
Walther, D., & Koch, C. (2006). Modeling attention to salient proto-objects. Neural Networks, 19(9), 1395–1407.
Wolfe, J. M., Alvarez, G. A., & Horowitz, T. S. (2000). Attention is fast but volition is slow. Nature, 406, 691.
Wolfe, J. M. (1998). Visual search. In Attention (pp. 13–73).
Wolfe, J. M. (2005). Guidance of visual search by preattentive information. In Neurobiology of attention (pp. 101–104).
Zhai, Y., & Shah, M. (2006). Visual attention detection in video sequences using spatiotemporal cues. In ACM international conference on multimedia (pp. 815–824).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, J., Tian, Y., Huang, T. et al. Probabilistic Multi-Task Learning for Visual Saliency Estimation in Video. Int J Comput Vis 90, 150–165 (2010). https://doi.org/10.1007/s11263-010-0354-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-010-0354-6