Skip to main content

Compact Representation for Large-Scale Clustering and Similarity Search

  • Conference paper
Advances in Multimedia Information Processing - PCM 2006 (PCM 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4261))

Included in the following conference series:

  • 525 Accesses

Abstract

Although content-based image retrieval has been researched for many years, few content-based methods are implemented in present image search engines. This is partly bacause of the great difficulty in indexing and searching in high-dimensional feature space for large-scale image datasets. In this paper, we propose a novel method to represent the content of each image as one or multiple hash codes, which can be considered as special keywords. Based on this compact representation, images can be accessed very quickly by their visual content. Furthermore, two advanced functionalities are implemented. One is content-based image clustering, which is simplified as grouping images with identical or near identical hash codes. The other is content-based similarity search, which is approximated by finding images with similar hash codes. The hash code extraction process is very simple, and both image clustering and similarity search can be performed in real time. Experiments on over 11 million images collected from the web demonstrate the efficiency and effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Google Image Search, http://images.google.com

  2. Yahoo Image Search, http://images.search.yahoo.com

  3. Fotolia, http://www.fotolia.com

  4. Veltkamp, R.C., Tanase, M.: Content-Based Image Retrieval Systems: A Survey. Technical Report UU-CS-2000-34, Dept. of Computing Science, Utrecht University (2000)

    Google Scholar 

  5. Quack, T., Mönich, U., Thiele, U., Manjunath, B.S.: Cortina: a system for large-scale, content-based web image retrieval. In: Proceedings of the 12th annual ACM international conference on Multimedia, pp. 508–511 (2004)

    Google Scholar 

  6. Kherfi, M.L., Ziou, D., Bernardi, A.: Image Retrieval from the World Wide Web: Issues, Techniques and Systems. ACM Computing Surveys (2004)

    Google Scholar 

  7. Wang, B., Li, Z., Li, M.: Large-Scale Duplicate Detection for Web Image Search. In: International Conference on Multimedia & Expo. (2006)

    Google Scholar 

  8. Böhm, K., Mlivoncic, M., Schek, H.-J., Weber, R.: Fast Evaluation Techniques for Complex Similarity Queries. In: Proceedings of the 27th International Conference on Very Large Data Bases, pp. 211–220 (2001)

    Google Scholar 

  9. Naturl, X., Gros, P.: A Fast Shot Matching Strategy for Detecting Duplicate Sequences in a Television Stream. In: Proceedings of the 2nd ACM SIGMOD international workshop on Computer Vision meets DataBases (2005)

    Google Scholar 

  10. Ferhatosmanoglu, H., Tuncel, E., Agrawal, D., Abbadi, A.: Vector Approximation based Indexing for Non-uniform High Dimensional Data Sets. In: Proceedings of 9th CIKM, McLean, USA, pp. 202–209 (2000)

    Google Scholar 

  11. Riskin, E.A.: Optimal Bit Allocation via the Generalized BFOS algorithm. IEEE Trans. on Information Theory 37(2), 400–402 (1991)

    Article  MathSciNet  Google Scholar 

  12. Zeng, H., He, Q., Chen, Z., Ma, W.-Y., Ma, J.: Learning to cluster web search results. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval (2004)

    Google Scholar 

  13. Li, Z., Xie, X., Liu, H., Tang, X., Li, M., Ma, W.-Y.: Intuitive and effective interfaces for WWW image search engines. In: Proceedings of the 12th annual ACM international conference on Multimedia (2004)

    Google Scholar 

  14. Tong, H., Li, M., Zhang, H.-J., Zhang, C., He, J., Ma, W.-Y.: Learning No-Reference Quality Metric by Examples. In: Proceedings of the 11th International Multimedia Modeling Conference 2005 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, B., Chen, Y., Li, Z., Li, M. (2006). Compact Representation for Large-Scale Clustering and Similarity Search. In: Zhuang, Y., Yang, SQ., Rui, Y., He, Q. (eds) Advances in Multimedia Information Processing - PCM 2006. PCM 2006. Lecture Notes in Computer Science, vol 4261. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11922162_95

Download citation

  • DOI: https://doi.org/10.1007/11922162_95

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-48766-1

  • Online ISBN: 978-3-540-48769-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics