Abstract
Although content-based image retrieval has been researched for many years, few content-based methods are implemented in present image search engines. This is partly bacause of the great difficulty in indexing and searching in high-dimensional feature space for large-scale image datasets. In this paper, we propose a novel method to represent the content of each image as one or multiple hash codes, which can be considered as special keywords. Based on this compact representation, images can be accessed very quickly by their visual content. Furthermore, two advanced functionalities are implemented. One is content-based image clustering, which is simplified as grouping images with identical or near identical hash codes. The other is content-based similarity search, which is approximated by finding images with similar hash codes. The hash code extraction process is very simple, and both image clustering and similarity search can be performed in real time. Experiments on over 11 million images collected from the web demonstrate the efficiency and effectiveness of the proposed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Google Image Search, http://images.google.com
Yahoo Image Search, http://images.search.yahoo.com
Fotolia, http://www.fotolia.com
Veltkamp, R.C., Tanase, M.: Content-Based Image Retrieval Systems: A Survey. Technical Report UU-CS-2000-34, Dept. of Computing Science, Utrecht University (2000)
Quack, T., Mönich, U., Thiele, U., Manjunath, B.S.: Cortina: a system for large-scale, content-based web image retrieval. In: Proceedings of the 12th annual ACM international conference on Multimedia, pp. 508–511 (2004)
Kherfi, M.L., Ziou, D., Bernardi, A.: Image Retrieval from the World Wide Web: Issues, Techniques and Systems. ACM Computing Surveys (2004)
Wang, B., Li, Z., Li, M.: Large-Scale Duplicate Detection for Web Image Search. In: International Conference on Multimedia & Expo. (2006)
Böhm, K., Mlivoncic, M., Schek, H.-J., Weber, R.: Fast Evaluation Techniques for Complex Similarity Queries. In: Proceedings of the 27th International Conference on Very Large Data Bases, pp. 211–220 (2001)
Naturl, X., Gros, P.: A Fast Shot Matching Strategy for Detecting Duplicate Sequences in a Television Stream. In: Proceedings of the 2nd ACM SIGMOD international workshop on Computer Vision meets DataBases (2005)
Ferhatosmanoglu, H., Tuncel, E., Agrawal, D., Abbadi, A.: Vector Approximation based Indexing for Non-uniform High Dimensional Data Sets. In: Proceedings of 9th CIKM, McLean, USA, pp. 202–209 (2000)
Riskin, E.A.: Optimal Bit Allocation via the Generalized BFOS algorithm. IEEE Trans. on Information Theory 37(2), 400–402 (1991)
Zeng, H., He, Q., Chen, Z., Ma, W.-Y., Ma, J.: Learning to cluster web search results. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval (2004)
Li, Z., Xie, X., Liu, H., Tang, X., Li, M., Ma, W.-Y.: Intuitive and effective interfaces for WWW image search engines. In: Proceedings of the 12th annual ACM international conference on Multimedia (2004)
Tong, H., Li, M., Zhang, H.-J., Zhang, C., He, J., Ma, W.-Y.: Learning No-Reference Quality Metric by Examples. In: Proceedings of the 11th International Multimedia Modeling Conference 2005 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, B., Chen, Y., Li, Z., Li, M. (2006). Compact Representation for Large-Scale Clustering and Similarity Search. In: Zhuang, Y., Yang, SQ., Rui, Y., He, Q. (eds) Advances in Multimedia Information Processing - PCM 2006. PCM 2006. Lecture Notes in Computer Science, vol 4261. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11922162_95
Download citation
DOI: https://doi.org/10.1007/11922162_95
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-48766-1
Online ISBN: 978-3-540-48769-2
eBook Packages: Computer ScienceComputer Science (R0)