Image visual attention computation and application via the learning of object attributes

Han, Junwei; Wang, Dongyang; Shao, Ling; Qian, Xiaoliang; Cheng, Gong; Han, Jungong

doi:10.1007/s00138-013-0558-1

Image visual attention computation and application via the learning of object attributes

Special Issue Paper
Published: 10 October 2013

Volume 25, pages 1671–1683, (2014)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Junwei Han¹,
Dongyang Wang¹,
Ling Shao²,
Xiaoliang Qian¹,
Gong Cheng¹ &
…
Jungong Han³

682 Accesses
13 Citations
Explore all metrics

Abstract

Visual attention aims at selecting a salient subset from the visual input for further processing while ignoring redundant data. The dominant view for the computation of visual attention is based on the assumption that bottom-up visual saliency such as local contrast and interest points drives the allocation of attention in scene viewing. However, we advocate in this paper that the deployment of attention is primarily and directly guided by objects and thus propose a novel framework to explore image visual attention via the learning of object attributes from eye-tracking data. We mainly aim to solve three problems: (1) the pixel-level visual attention computation (the saliency map); (2) the image-level visual attention computation; (3) the application of the computation model in image categorization. We first adopt the algorithm of object bank to acquire the responses to a number of object detectors at each location in an image and thus form a feature descriptor to indicate the occurrences of various objects at a pixel or in an image. Next, we integrate the inference of interesting objects from fixations in eye-tracking data with the competition among surrounding objects to solve the first problem. We further propose a computational model to solve the second problem and estimate the interestingness of each image via the mapping between object attributes and the inter-observer visual congruency obtained from eye-tracking data. Finally, we apply the proposed pixel-level visual attention model to the image categorization task. Comprehensive evaluations on publicly available benchmarks and comparisons with state-of-the-art methods demonstrate the effectiveness of the proposed models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Top-Down Biasing and Modulation for Object-Based Visual Attention

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

Exploiting Visual Saliency Algorithms for Object-Based Attention: A New Color and Scale-Based Approach

Notes

Although this paper advocates that objects instead of visual saliency directly guide attention, we still use the term of “saliency map” to represent the output of pixel-level computational attention models to be consistent with most previous works.

References

Achanta, R., Estrada, F., Wils, P., Süsstrunk, S.: Salient region detection and segmentation. In: ICVS, pp. 66–75 (2008)
Achanta, R., Hemami, S., Estrada, F., Susstrunk, S.: Frequency-tuned salient region detection. In: CVPR, pp. 1597–1604 (2009)
Alexe, B., Deselaers, T., Ferrari, V.: What is an object? In: CVPR, pp. 73–80 (2010)
Bruce, N., Tsotsos, J.: Saliency based on information maximization. In: NIPS, pp. 155–162 (2006)
Chen, Z., Han, J., Ngan, K.N.: Dynamic bit allocation for multiple video object coding. IEEE Trans. Multimed. 8(6), 1117–1124 (2006)
Article Google Scholar
Cheng, M.M., Zhang, G.X., Mitra, N.J., Huang, X., Hu, S.M.: Global contrast based salient region detection. In: CVPR, pp. 409–416 (2011)
Einhäuser, W., Spain, M., Perona, P.: Objects predict fixations better than early saliency. J. Vis. 8(14), 1–26 (2008)
Article Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
Article Google Scholar
Gao, D., Mahadevan, V., Vasconcelos, N.: On the plausibility of the discriminant center-surround hypothesis for visual saliency. J. Vis. 8(7), 1–18 (2008)
Article Google Scholar
Goferman, S., Zelnik-Manor, L., Tal, A.: Context-aware saliency detection. In: CVPR, pp. 2376–2383 (2010)
Gopalakrishnan, V., Hu, Y., Rajan, D.: Salient region detection by modeling distributions of color and orientation. IEEE Trans. Multimed. 11(5), 892–905 (2009)
Article Google Scholar
Guo, C., Ma, Q., Zhang, L.: Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform. In: CVPR, pp. 1–8 (2008)
Han, B., Zhu, H., Ding, Y.: Bottom-up saliency based on weighted sparse coding residual. In: ACM Multimedia, pp. 1117–1120 (2011)
Han, J., Ngan, K.N., Li, M., Zhang, H.J.: Unsupervised extraction of visual attention objects in color images. IEEE Trans. Circuit. Syst. Video Technol. 16(1), 141–145 (2006)
Article Google Scholar
Han, J., Pauwels, E., Zeeuw, P.: Fast saliency-aware multi-modality image fusion. Neurocomputing 111, 70–80 (2013)
Article Google Scholar
Harel, J., Koch, C., Perona, P.: Graph-Based Visual Saliency. In: NIPS, pp. 545–552 (2007)
Hoiem, D., Efros, A.A., Hebert, M.: Automatic photo pop-up. ACM Trans. Graphics 24, 577–584 (2005)
Article Google Scholar
Hou, X., Zhang, L.: Saliency detection: a spectral residual approach. In: CVPR, pp. 1–8 (2007)
Hou, X., Harel, J., Koch, C.: Image signature: highlighting sparse salient regions. IEEE Trans. Pattern Anal. Mach. Intell. 34(1), 194–201 (2012)
Article Google Scholar
Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20(11), 1254–1259 (1998)
Article Google Scholar
Itti, L., Koch, C.: A saliency-based search mechanism for overt and covert shifts of visual attention. Vis. Res. 40(10–12), 1489–1506 (2000)
Article Google Scholar
Jiang, P., Qin, X.: Keyframe-based video summary using visual attention clues. IEEE MultiMed. 17(2), 64–73 (2010)
MathSciNet Google Scholar
Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: ICCV, pp. 2106–2113 (2009)
Khuwuthyakorn, P., Robles-Kelly, A., Zhou, J.: Object of interest detection by saliency learning. In: ECCV, pp. 636–649 (2010)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR, pp. 2169–2178 (2006)
Le Meur, O., Le Callet, P., Barba, D., Thoreau, D.: A coherent computational approach to model bottom-up visual attention. IEEE Trans. Pattern Anal. Mach. Intell. 28(5), 802–817 (2006)
Article Google Scholar
Le Meur, O., Baccino, T., Roumy, A.: Prediction of the inter-observer visual congruency (IOVC) and application to image ranking. In: ACM Multimedia, pp. 373–382 (2011)
Lee, W.F., Huang, T.H., Yeh, S.L., Chen, H.H.: Learning-based prediction of visual attention for video signals. IEEE Trans. Image Process. 20(11), 3028–3038 (2011)
Article MathSciNet Google Scholar
Li, L.J., Su, H., Lim, Y., Fei-Fei, L.: Objects as attributes for scene classification. In: ECCV, pp. 1–13 (2010)
Liu, F., Gleicher, M.: Video retargeting: automating pan and scan. In: ACM Multimedia, pp. 241–250 (2006)
Liu, T., Wang, J., Sun, J., Zheng, N., Tang, X., Shum, H.Y.: Picture collage. IEEE Trans. Multimed. 11(7), 1225–1239 (2009)
Article Google Scholar
Liu, T., Yuan, Z., Sun, J., Wang, J., Zheng, N., Tang, X., Shum, H.Y.: Learning to detect a salient object. IEEE Trans. Pattern Anal. Mach. Intell. 33(2), 353–367 (2011)
Article Google Scholar
Ma, Y.F., Lu, L., Zhang, H.J., Li, M.: A user attention model for video summarization. In: ACM Multimedia, pp. 533–542 (2002)
Ma, Y.F., Zhang, H.J.: Contrast-based image attention analysis by using fuzzy growing. In: ACM Multimedia, pp. 374–381 (2003)
Ngo, C.W., Ma, Y.F., Zhang, H.J.: Video summarization and scene detection by graph modeling. IEEE Trans. Circuit. Syst. Video Technol. 15(2), 296–305 (2005)
Article Google Scholar
Nuthmann, A., Henderson, J.M.: Object-based attentional selection in scene viewing. J. Vis. 10(8), 1–19 (2010)
Article Google Scholar
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)
Article MATH Google Scholar
Seo, H.J., Milanfar, P.: Static and space-time visual saliency detection by self-resemblance. J. Vis. 9(12), 1–27 (2009)
Article Google Scholar
Shao, L., Brady, M.: Specific object retrieval based on salient regions. Pattern Recognit. 39(10), 1932–1948 (2006)
Article MATH Google Scholar
Shao, L., Kadir, T., Brady, M.: Geometric and photometric invariant distinctive regions detection. Inf. Sci. 177(4), 1088–1122 (2007)
Article Google Scholar
Subramanian, R., Yanulevskaya, V., Sebe, N.: Can computers learn from humans to see better?: inferring scene semantics from viewers’ eye movements. In: ACM Multimedia, pp. 33–42 (2011)
Tatler, B.W., Hayhoe, M.M., Land, M.F., Ballard, D.H.: Eye guidance in natural vision: reinterpreting salience. J. Vis. 11(5), 1–23 (2011)
Article Google Scholar
Torralba, A., Oliva, A., Castelhano, M.S., Henderson, J.M.: Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychol. Rev. 113(4), 766–786 (2006)
Article Google Scholar
Wang, L., Xue, J., Zheng, N., Hua, G.: Automatic salient object extraction with contextual cue. In: ICCV, pp. 105–112 (2011)
Han, J., He, S., Qian, X., Wang, D., Guo, L., Liu, T.: An object-oriented visual saliency detection framework based on sparse coding representations. IEEE Trans. Circuit. Syst. Video Technol. (2013)
Han, J., Shao, L., Xu, D., Shotton, J.: Enhanced computer vision with Microsoft Kinect sensor: a review. IEEE Trans. Cybern. 43(5), 1318–1334 (2013)
Article Google Scholar
Hong, R., Tang, J., Tan, H., Ngo, C., Yan, S., Chua, T.: Beyond search: event-driven summarization for web videos. TOMCCAP 7(4), 35 (2011)
Article Google Scholar
Wang, M., Hong, R., Li, G., Zha, Z., Yan, S., Chua, T.: Event driven web video summarization by tag localization and key-shot identification. IEEE Trans. Multimed. 14(4), 975–985 (2012)
Article Google Scholar
Hong, R., Wang, M., Li, G., Nie, L., Zha, Z., Chua, T.: Multimedia question answering. IEEE Multimed. 19(4), 72–78 (2012)
Article Google Scholar

Download references

Acknowledgments

This work was partially supported by the National Science Foundation of China under Grant 61005018 and 91120005, NPU-FFR-JC20104, and Program for New Century Excellent Talents in University under grant NCET-10-0079.

Author information

Authors and Affiliations

School of Automation, Northwestern Polytechnical University, Xi’an, 710072, China
Junwei Han, Dongyang Wang, Xiaoliang Qian & Gong Cheng
University of Sheffield, Sheffield, UK
Ling Shao
Civolution Technology, Eindhoven, The Netherlands
Jungong Han

Authors

Junwei Han
View author publications
You can also search for this author in PubMed Google Scholar
Dongyang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ling Shao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoliang Qian
View author publications
You can also search for this author in PubMed Google Scholar
Gong Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Jungong Han
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jungong Han.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Han, J., Wang, D., Shao, L. et al. Image visual attention computation and application via the learning of object attributes. Machine Vision and Applications 25, 1671–1683 (2014). https://doi.org/10.1007/s00138-013-0558-1

Download citation

Received: 13 February 2013
Revised: 08 July 2013
Accepted: 19 September 2013
Published: 10 October 2013
Issue Date: October 2014
DOI: https://doi.org/10.1007/s00138-013-0558-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Image visual attention computation and application via the learning of object attributes

Abstract

Access this article

Similar content being viewed by others

Top-Down Biasing and Modulation for Object-Based Visual Attention

Attention mechanisms in computer vision: A survey

Exploiting Visual Saliency Algorithms for Object-Based Attention: A New Color and Scale-Based Approach

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Image visual attention computation and application via the learning of object attributes

Abstract

Access this article

Similar content being viewed by others

Top-Down Biasing and Modulation for Object-Based Visual Attention

Attention mechanisms in computer vision: A survey

Exploiting Visual Saliency Algorithms for Object-Based Attention: A New Color and Scale-Based Approach

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation