Skip to main content
Log in

Salient region detection and segmentation for general object recognition and image understanding

  • Research Papers
  • Special Focus
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

General object recognition and image understanding is recognized as a dramatic goal for computer vision and multimedia retrieval. In spite of the great efforts devoted in the last two decades, it still remains an open problem. In this paper, we propose a selective attention-driven model for general image understanding, named GORIUM (general object recognition and image understanding model). The key idea of our model is to discover recurring visual objects by selective attention modeling and pairwise local invariant features matching on a large image set in an unsupervised manner. Towards this end, it can be formulated as a four-layer bottomup model, i.e., salient region detection, object segmentation, automatic object discovering and visual dictionary construction. By exploiting multi-task learning methods to model visual saliency simultaneously with the bottom-up and top-down factors, the lowest layer can effectively detect salient objects in an image. The second layer exploits a simple yet effective learning approach to generate two complementary maps from several raw saliency maps, which then can be utilized to segment the salient objects precisely from a complex scene. For the third layer, we have also implemented an unsupervised approach to automatically discover general objects from large image set by pairwise matching with local invariant features. Afterwards, visual dictionary construction can be implemented by using many state-of-the-art algorithms and tools available nowadays.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Smeulders A W M, Worring M, Santini S, et al. Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell, 2000, 22: 1349–1380

    Article  Google Scholar 

  2. Lowe D G. Distinctive image features from scale-invariant keypoints. Int J Comput Vision, 2004, 60: 91–110

    Article  Google Scholar 

  3. Bay H, Ess A, Tuytelaars T, et al. SURF: Speeded up robust features. Comput Vis Image Underst, 2008, 110: 346–359

    Article  Google Scholar 

  4. Deng J, Dong W, Socher R, et al. ImageNet: A large-scale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 2009. 248–255

  5. Biederman I. Recognition-by-components: A theory of human image understanding. Psycho Rev, 1987, 94: 115–147

    Article  Google Scholar 

  6. Itti L, Rees G, Tsotsos J. Neurobiology of Attention. San Diego: Elsevier, 2005

    Google Scholar 

  7. Li J, Tian Y H, Huang T J, et al. Probabilistic multi-task learning for visual saliency estimation in video. Int J Comput Vision, 2010, 90: 150–165

    Article  Google Scholar 

  8. Li J, Tian Y H, Huang T J, et al. Cost-sensitive rank learning from positive and unlabeled data for visual saliency estimation. IEEE Signal Process Lett, 2010, 17: 591–594

    Article  Google Scholar 

  9. Li J, Tian Y H, Huang T J, et al. Multi-task rank learning for visual saliency in video. IEEE Trans Circuits Syst Video Technol, 2011, 21: 623–636

    Article  Google Scholar 

  10. Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell, 1998, 20: 1254–1259

    Article  Google Scholar 

  11. Achanta R, Hemami S, Estrada F, et al. Frequency-tuned salient region detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 2009. 1597–1604

  12. Hou X, Zhang L. Saliency detection: a spectral residual approach. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, Minnesota, USA, 2007. 1–8

  13. Ma Y, Zhang H. Contrast-based image attention analysis by using fuzzy growing. In: Proceedings of the 11th ACM International Conference on Multimedia, Berkeley, CA, USA, 2003. 374–381

  14. Yu H N, Li J, Tian Y H, et al. Automatic interesting object extraction from images using complementary saliency maps. In: Proceedings of ACM Multimedia, Firenze, Italy, 2010. 891–894

  15. Goferman S, Manor L Z, Tal A. Context-aware saliency detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 2010, 2376–2383

  16. Harel J, Koch C, Perona P. Graph-based visual saliency. Adv Neural Inf Process Syst, 2007, 19: 545–552

    Google Scholar 

  17. Seo H J, Milanfar P. Static and space-time visual saliency detection by self-resemblance. J Vision, 2009, 9: 1–27

    Article  Google Scholar 

  18. Rother C, Kolmogorov V, Blake A. GrabCut-interactive foreground extraction using iterated graph cuts. ACM Trans Graphics, 2004, 23: 309–314

    Article  Google Scholar 

  19. Boykov Y, Kolmogorov V. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans Pattern Anal Mach Intell, 2004, 23: 1124–1137

    Article  Google Scholar 

  20. Movahedi V, Elder J H. Design and perceptual validation of performance measures for salient object segmentation. In: Proceedings of IEEE Workshop on Perceptual Organization in Computer Vision, San Francisco, CA, USA, 2010

  21. Chen D, Tsai S, Chandrasekhar V, et al. Inverted index compression for scalable image matching. In: Proceedings of IEEE Data Compression Conference, Snowbird, UT, USA, 2010

  22. Chen Z, Duan L Y, Wang C Y, et al. Generating vocabulary for global feature representation towards commerce image retrieval. In: Proceedings of IEEE International Conference Image Processing, Brussels, Belgium, 2011

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to YongHong Tian.

Additional information

HUANG TieJun was born in 1970. He received the Ph.D. degree in pattern recognition and intelligent systems from Huazhong University of Science and Technology in 1998. Currently he is a professor at the School of Electrical Engineering and Computer Science of Peking University and the vice director of the National Engineering Laboratory of Video Technology of China. He is supported as New Century Excellent Talents in University by Ministry of Education of China. His research interests include image understanding, video coding, digital libraries and digital rights management. He is a council member of Chinese Institute of Electronics, a senior member of China Computer Federation, a board member of Director of Digital Media Project and an advisory board of IEEE Computing Now.

TIAN YongHong was born in 1975. He received the Ph.D. degree in computer application technology from Institute of Computing Technology, Chinese Academy of Sciences in 2005. Currently he is an associate professor at the National Engineering Laboratory of Video Technology, School of Electrical Engineering and Computer Science of Peking University. His research interests include machine learning and multimedia content analysis, retrieval, and copyright management. He is a senior member of IEEE.

Electronic supplementary material

Supplementary material, approximately 11.5 MB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, T., Tian, Y., Li, J. et al. Salient region detection and segmentation for general object recognition and image understanding. Sci. China Inf. Sci. 54, 2461–2470 (2011). https://doi.org/10.1007/s11432-011-4487-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11432-011-4487-1

Keywords

Navigation