Abstract
General object recognition and image understanding is recognized as a dramatic goal for computer vision and multimedia retrieval. In spite of the great efforts devoted in the last two decades, it still remains an open problem. In this paper, we propose a selective attention-driven model for general image understanding, named GORIUM (general object recognition and image understanding model). The key idea of our model is to discover recurring visual objects by selective attention modeling and pairwise local invariant features matching on a large image set in an unsupervised manner. Towards this end, it can be formulated as a four-layer bottomup model, i.e., salient region detection, object segmentation, automatic object discovering and visual dictionary construction. By exploiting multi-task learning methods to model visual saliency simultaneously with the bottom-up and top-down factors, the lowest layer can effectively detect salient objects in an image. The second layer exploits a simple yet effective learning approach to generate two complementary maps from several raw saliency maps, which then can be utilized to segment the salient objects precisely from a complex scene. For the third layer, we have also implemented an unsupervised approach to automatically discover general objects from large image set by pairwise matching with local invariant features. Afterwards, visual dictionary construction can be implemented by using many state-of-the-art algorithms and tools available nowadays.
Similar content being viewed by others
References
Smeulders A W M, Worring M, Santini S, et al. Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell, 2000, 22: 1349–1380
Lowe D G. Distinctive image features from scale-invariant keypoints. Int J Comput Vision, 2004, 60: 91–110
Bay H, Ess A, Tuytelaars T, et al. SURF: Speeded up robust features. Comput Vis Image Underst, 2008, 110: 346–359
Deng J, Dong W, Socher R, et al. ImageNet: A large-scale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 2009. 248–255
Biederman I. Recognition-by-components: A theory of human image understanding. Psycho Rev, 1987, 94: 115–147
Itti L, Rees G, Tsotsos J. Neurobiology of Attention. San Diego: Elsevier, 2005
Li J, Tian Y H, Huang T J, et al. Probabilistic multi-task learning for visual saliency estimation in video. Int J Comput Vision, 2010, 90: 150–165
Li J, Tian Y H, Huang T J, et al. Cost-sensitive rank learning from positive and unlabeled data for visual saliency estimation. IEEE Signal Process Lett, 2010, 17: 591–594
Li J, Tian Y H, Huang T J, et al. Multi-task rank learning for visual saliency in video. IEEE Trans Circuits Syst Video Technol, 2011, 21: 623–636
Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell, 1998, 20: 1254–1259
Achanta R, Hemami S, Estrada F, et al. Frequency-tuned salient region detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 2009. 1597–1604
Hou X, Zhang L. Saliency detection: a spectral residual approach. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, Minnesota, USA, 2007. 1–8
Ma Y, Zhang H. Contrast-based image attention analysis by using fuzzy growing. In: Proceedings of the 11th ACM International Conference on Multimedia, Berkeley, CA, USA, 2003. 374–381
Yu H N, Li J, Tian Y H, et al. Automatic interesting object extraction from images using complementary saliency maps. In: Proceedings of ACM Multimedia, Firenze, Italy, 2010. 891–894
Goferman S, Manor L Z, Tal A. Context-aware saliency detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 2010, 2376–2383
Harel J, Koch C, Perona P. Graph-based visual saliency. Adv Neural Inf Process Syst, 2007, 19: 545–552
Seo H J, Milanfar P. Static and space-time visual saliency detection by self-resemblance. J Vision, 2009, 9: 1–27
Rother C, Kolmogorov V, Blake A. GrabCut-interactive foreground extraction using iterated graph cuts. ACM Trans Graphics, 2004, 23: 309–314
Boykov Y, Kolmogorov V. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans Pattern Anal Mach Intell, 2004, 23: 1124–1137
Movahedi V, Elder J H. Design and perceptual validation of performance measures for salient object segmentation. In: Proceedings of IEEE Workshop on Perceptual Organization in Computer Vision, San Francisco, CA, USA, 2010
Chen D, Tsai S, Chandrasekhar V, et al. Inverted index compression for scalable image matching. In: Proceedings of IEEE Data Compression Conference, Snowbird, UT, USA, 2010
Chen Z, Duan L Y, Wang C Y, et al. Generating vocabulary for global feature representation towards commerce image retrieval. In: Proceedings of IEEE International Conference Image Processing, Brussels, Belgium, 2011
Author information
Authors and Affiliations
Corresponding author
Additional information
HUANG TieJun was born in 1970. He received the Ph.D. degree in pattern recognition and intelligent systems from Huazhong University of Science and Technology in 1998. Currently he is a professor at the School of Electrical Engineering and Computer Science of Peking University and the vice director of the National Engineering Laboratory of Video Technology of China. He is supported as New Century Excellent Talents in University by Ministry of Education of China. His research interests include image understanding, video coding, digital libraries and digital rights management. He is a council member of Chinese Institute of Electronics, a senior member of China Computer Federation, a board member of Director of Digital Media Project and an advisory board of IEEE Computing Now.
TIAN YongHong was born in 1975. He received the Ph.D. degree in computer application technology from Institute of Computing Technology, Chinese Academy of Sciences in 2005. Currently he is an associate professor at the National Engineering Laboratory of Video Technology, School of Electrical Engineering and Computer Science of Peking University. His research interests include machine learning and multimedia content analysis, retrieval, and copyright management. He is a senior member of IEEE.
Electronic supplementary material
Supplementary material, approximately 11.5 MB.
Rights and permissions
About this article
Cite this article
Huang, T., Tian, Y., Li, J. et al. Salient region detection and segmentation for general object recognition and image understanding. Sci. China Inf. Sci. 54, 2461–2470 (2011). https://doi.org/10.1007/s11432-011-4487-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-011-4487-1