Abstract
Bridging the cognitive gap in image retrieval has been an active research direction in recent years, of which a key challenge is to get enough training data to learn the mapping functions from low-level feature spaces to high-level semantics. In this paper, image regions are classified into two types: key regions representing the main semantic contents and environmental regions representing the contexts. We attempt to leverage the correlations between types of regions to improve the performance of image retrieval. A Context Expansion approach is explored to take advantages of such correlations by expanding the key regions of the queries using highly correlated environmental regions according to an image thesaurus. The thesaurus serves as both a mapping function between image low-level features and concepts and a store of the statistical correlations between different concepts. It is constructed through a data-driven approach which uses Web data (images, their surrounding textual annotations) as training data source to learn the region concepts and to explore the statistical correlations. Experimental results on a database of 10,000 general-purpose images show the effectiveness of our proposed approach in both improving search precision (i.e. filter irrelevant images) and recall (i.e. retrieval relevant images whose context may be varied). Several major factors which have impact on the performance of our approach are also studied.
Similar content being viewed by others
References
Jing, F., Li, M.J., Zhang, H.J., Zhang, B.: Support vector machines for region-based image retrieval. In: Proceedings of the IEEE International Conference on Multimedia and Expo. Baltimore, Maryland (2003)
Jing, F., Li, M.J., Zhang, H.J., Zhang, B.: An efficient and effective region-based image retrieval framework. IEEE Trans. Image Process. 13(5):699–709 (2004)
Barnard, K., Duygulu, P., Forsyth, D.: Clustering art. Computer Vision and Pattern Recognition, II:434–439 (2001)
Barnard, K., Duygulu, P., Forsyth, D., Freitas, N., Blei, D.M., Jordan, M.: Matching words and pictures. J. Mach. Learn. Res. 3, 1107–1135 (2003)
Ma, W.Y., Manjunath, B.S.: Netra: a toolbox for navigating large image databases. In: Proceedings of the International Conference on Image Processing. Washington DC, USA (1997)
Wood, M.E.J., Campbell, N.W., Thomas, B.T.: Iterative refinement by relevance feedback in content-based digital image retrieval. In: Proceedings of the ACM International Conference on Multimedia. Bristol, UK (1998)
Zhu, L., Rao, A.B., Zhang, A.D.: Advanced feature extraction for keyblock-based image retrieval. Inform. Syst. 27(8), 537–557 (2002)
Tong, S., Chang, E.: Support vector machine active learning for image retrieval. In: Proceedings of the ACM International Conference on Multimedia. Ontario, Canada (2001)
Zhou, X.S., Huang, T.S.: Unifying keywords and visual contents in image retrieval. IEEE Multimedia 9(2), 23–33 (2002)
Chang, E., Goh, K., Sychay, G., Wu, G.: CBSA: Content-based soft annotation for multimodal image retrieval using Bayes point machines. In: Proceedings of the IEEE Transactions on CSVT Special Issue on Conceptual and Dynamical Aspects of Multimedia Content Description, vol. 13, no. 1, pp. 26–38 (2003)
Zhang, H.J., Su, Z.: Improving CBIR by semantic propagation and cross-mode query expansion. Multi-Media Content Based Indexing and Retrieval (2001)
Porkaewand, K., Mehrotra, S.: Query reformulation for content based multimedia retrieval in MARS. Technical Report TR-MARS-99-05, University of California at Irvine (1999)
Ma, Y.F., Zhang, H.J.: Contrast-based image attention analysis by using fuzzy growing, In: Proceedings of the ACM International Conference on Multimedia. Berkeley, CA USA (2003)
Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of the 26th Annual International ACM SIGIR Conference. Toronto, Canada (2003)
Srihari, R.K.: Use of multimedia input in automated image annotation and content-based retrieval. Storage and Retrieval for Image and Video Databases, pp. 249–260 (1995).
Fellbaum, C.: WordNet: An Electronical Lexical Database. MIT Press, Cambridge, Mass (1998)
Cai, D., Yu, S., Wen, J.R. Ma, W.-Y.: VIPS: a vision-based page segmentation algorithm. Microsoft Technical Report, MSR-TR-2003-79 (2003)
Deng, Y., Manjunath, B.S.: Unsupervised segmentation of color-texture regions in images and video. IEEE Trans. Pattern Anal. Mach. Intell. 23(8), 800–810 (2001)
Wang, X.J., Ma, W.Y., Li, X.: Data-driven approach for bridging the cognitive gap in image retrieval. In: Proceedings of the IEEE International Conference on Multimedia and Expo. Taipei, Taiwan (2004)
Sneath, P., Sokal, R.: Numerical Taxonomy: The Principles and Practice of Numerical Classification. W.H. Freeman, San Francisco, pp. 573 (1973)
Rubner, Y., Guibas, L.J., Tomasi, C.: The Earth mover's distance, multi-dimensional scaling, and color-based image retrieval. In: Proceedings of the ARPA Image Understanding Workshop, pp. 661–668. New Orleans, LA (1997)
Mikolajczyk, K., Schmid, C.: Scale and affine invariant interest point detectors. IJCV 1(60):63–86 (2004)
Kadir, T.: Scale, saliency and scene description. Ph.D. Thesis, Oxford University (2002)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, XJ., Ma, WY. & Li, X. Exploring statistical correlations for image retrieval. Multimedia Systems 11, 340–351 (2006). https://doi.org/10.1007/s00530-006-0013-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-006-0013-5