Skip to main content
Log in

Top-Down Saliency Detection via Contextual Pooling

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Goal-driven top-down mechanism plays important role in the case of object detection and recognition. In this paper, we propose a top-down computational model for goal-driven saliency detection based on the coding-based classification framework. It consists of four successive steps: feature extraction, descriptor coding, contextual pooling and saliency prediction. Particularly, we investigate the effect of spatial context information for saliency detection, and propose a block-wise spatial pooling operation to take advantage of contextual cues in multiple neighborhood scales and orientations. The experimental results on three datasets demonstrate that our method can effectively exploit contextual information and achieves the state-of-the-art performance on top-down saliency detection task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13

Similar content being viewed by others

Notes

  1. http://lear.inrialpes.fr/data

  2. http://labelme.csail.mit.edu

  3. We use the code of [32] for calculating the AP value.

  4. We adopt the implementation by LibLinear code package [34].

  5. We obtain the saliency maps for JCDL based on the pre-trained models provided by the authors of [20], which can be download from http://faculty.ucmerced.edu/mhyang/pubs.html. For the top-down saliency models in [16] and [13], we produce corresponding saliency maps based on the code from the websites of their authors respectively: http://ilab.usc.edu/toolkit/downloads.shtml and http://people.csail.mit.edu/tjudd/WherePeopleLook/index.html.

References

  1. Gao, D., Han, S., Vasconcelos, N. (2009). Discriminant saliency, the detection of suspicious coincidences, and applications to visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 989–1005.

    Article  Google Scholar 

  2. Lee, Y.B., & Lee, S. (2011). Robust face detection based on knowledge-directed specification of botton-up saliency. ETRI Journal, 33(4), 600–610.

    Article  Google Scholar 

  3. Achanta, R., & Susstrunk, S. (2009). Saliency detection for content-aware image resizing. In Proceedings of IEEE international conference on image processing (pp. 1005–1008).

  4. Guo, C., & Zhang, L. (2010). A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE Transactions on Image Processing, 19(1), 185–198.

    Article  MathSciNet  Google Scholar 

  5. Ma, Q., & Zhang, L. (2008). Image quality assessment with visual attention. In Proceedings of international conference on pattern recognition.

  6. Loupias, E., Sebe, N., Bres, S., Jolion, J.M. (2000). Wavelet-based salient points for image retrieval. In Proceedings of IEEE international conference on image processing (pp. 518–521).

  7. Itti, L., Koch, C., Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 1254–1259.

    Article  Google Scholar 

  8. Harel, J., Koch, C., Pernoa, P. (2006). Graph-based visual saliency. In Proceedings of advances in neural information processing systems (pp. 545–552).

  9. Bruce, N., & Tsotsos, J. (2006). Saliency based on information maximization. In Proceedings of advances in neural information processing systems (pp. 155–162).

  10. Hou, X., & Zhang, L. (2007). Saliency detection: a spectral residual approach. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 2280–2287).

  11. Hou, X., & Zhang, L. (2008). Dynamic visual attention: Searching for coding length increments. In Proceedings of advances in neural information processing systems (pp. 681–688).

  12. Achanta, R., Hemami, S., Estrada, F., Susstrunk, S. (2009). Frequency-tuned salient region detection. In Proceedings of IEEE international conference on computer vision and pattern recognition (Vol. 1–4, pp. 1597–1604).

  13. Judd, T., Ehinger, K., Durand, F., Torralba, A. (2009). Learning to predict where humans look. In Proceedings of IEEE international conference on computer vision (pp. 2106–2113).

  14. Cheng, M.M., Zhang, G.X., Mitra, N.J., Huang, X.L., Hu, S.M. (2011). Global contrast based salient region detection. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 409–416).

  15. Zhou, Q., Li, N.Y., Yang, Y., Chen, P., Liu, W.Y. (2012). Corner-surround contrast for saliency detection. In Proceedings of IEEE international conference on patten recognition (pp. 1–4).

  16. Navalpakkam, V., & Itti, L. (2006). An integrated model of top-down and bottom-up attention for optimizing detection speed. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 2049–2056).

  17. Frintrop, S., Backer, G., Rome, E. (2005). Goal-directed search with a top-down modulated computational attention system. In Proceedings of pattern recognition, LNCS (Vol. 3663, pp. 117–124).

  18. Oliva, A., Torralba, A., Castelhano, M.S., Henderson, J.M. (2003). Top-down control of visual attention in object detection. In Proceedings of IEEE international conference on image processing (pp. 253–256).

  19. Liu, T., Yuan, Z., Sun, J., Wang, J., Zheng, N., Tang, X., Shum, H.-Y. (2011). Learning to detect a salient object. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(2), 353–367.

    Article  Google Scholar 

  20. Yang, J., & Yang, M.-H. (2012). Top-down visual saliency via joint CRF and dictionary learning. In Proceedings of IEEE international conference on computer vision and pattern recognition.

  21. Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y. (2010). Locality-constrained linear coding for image classification. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 3360–3367).

  22. Yang, J., Yu, K., Gong, Y., Huang, T. (2009). Linear spatial pyramid matching using sparse coding for image classification. In Proceedings of IEEE international conference on computer vision and pattern recognition.

  23. Zhu, J., Zou, W., Yang, X., Zhang, R., Zhou, Q., Zhang, W. (2012). Image classification by hierarchical spatial pooling with partial least squares analysis. In Proceedings of British machine vision conference.

  24. Lowe, D.G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110.

    Article  Google Scholar 

  25. Yu, K., Zhang, T., Gong, Y. (2009). Nonlinear learning using local coordinate coding. In Proceedings of advances in neural information processing systems (pp. 2223–2231).

  26. Torralba, A., Oliva, A., Castelhano, M.S., Henderson, J.M. (2006). Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychology Review, 113, 766–786.

    Article  Google Scholar 

  27. Hong, R., Wang, M., Xu, M., Yan, S., Chua, T.-S. (2010). Dynamic captioning: video accessibility enhancement for hearing impairment. In Proceedings of ACM multimedia (pp. 421–430).

  28. Wang, M., Hong, R., Yuan, X.-T., Yan, S., Chua, T.-S. (2012). Movie2Comics: towards a lively video content presentation. IEEE Transactions on Multimedia, 14, 858–870.

    Article  Google Scholar 

  29. Carbonetto, P., Dork, G., Schmid, C., Kück, H., de Freitas, N. (2008). Learning to recognize objects with little supervision. International Journal of Computer Vision, 77, 219–237.

    Article  Google Scholar 

  30. Russell, B., Torralba, A., Murphy, K., Freeman, W.T. (2007). LabelMe: a database and web-based tool for image annotation. International Journal of Computer Vision, 77, 157–173.

    Article  Google Scholar 

  31. Opelt, A., Pinz, A., Fussenegger, M., Auer, P. (2006). Generic object recognition with boosting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(3), 416–431.

    Article  Google Scholar 

  32. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A (2010). The Pascal Visual Object Classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.

    Article  Google Scholar 

  33. Shen, X., & Wu, Y. (2012). A unified approach to salient object detection via low rank matrix recovery. In Proceedings of IEEE international conference on computer vision and pattern recognition.

  34. Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wanvg, X.-R., Lin, C.-J. (2008). LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874.

    MATH  Google Scholar 

  35. Zhai, Y., & Shah, M. (2006). Visual attention detection in video sequences using spatialtemporal cues. In Proceedings of the 14th annual ACM international conference on multimedia (pp. 815–824).

  36. Qiu, Y., Zhu, J., Zhang, R., Huang, J. (2012). Top-Down saliency by multi-scale contextual pooling. In Proceedings of the 13th Pacific-Rim conference on multimedia (pp. 294–305).

Download references

Acknowledgments

The authors thank Quan Zhou for valuable comments. We also thank Mingming Cheng, Ming-Hsuan Yang, Laurent Itti and Tilke Judd for providing the codes of their methods on experimental comparison.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Zhu.

Additional information

This work is funded by the Project NSFC (61071155, 61201446), 973 National Program (2010CB731401, 2010CB731406) and STCSM (12DZ2272600).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, J., Qiu, Y., Zhang, R. et al. Top-Down Saliency Detection via Contextual Pooling. J Sign Process Syst 74, 33–46 (2014). https://doi.org/10.1007/s11265-013-0768-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-013-0768-9

Keywords

Navigation