Skip to main content

GMM-ClusterForest: A Novel Indexing Approach for Multi-features Based Similarity Search in High-Dimensional Spaces

  • Conference paper
Neural Information Processing (ICONIP 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7664))

Included in the following conference series:

Abstract

This paper proposes a novel clustering based indexing approach called GMM-ClusterForest for supporting multi-features based similarity search in high-dimensional spaces. We fit a Gaussian Mixture Model (GMM) to data through the Expectation-Maximization (EM) algorithm for estimating GMM parameters and the Minimum Description Length (MDL) criterion for selecting GMM structure. Each Gaussian component in the GMM is taken as a cluster center and each data point is assigned to the cluster according to the Bayesian decision rule. By performing this clustering method hierarchically, an index tree is constructed and the corresponding similarity search method is developed for a type of features. Then multi-features based similarity search is fulfilled by fusing the index trees for all the types of features considered. We evaluated the proposed indexing approach through applying it to example-based image retrieval and conducting the experiments on Corel 1000 dataset and self-collected large dataset. The experimental results show that our approach is effective and promising.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Weber, R., Schek, H.J., Blott, S.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In: Proceedings of VLDB 1998, San Francisco, USA, pp. 194–205 (1998)

    Google Scholar 

  2. Bennett, K.P., Fayyad, U., Geigery, D.: Density-Based Indexing for Approximate, Nearest-Neighbor Queries. In: Proceedings of SIGKDD 1999, pp. 233–243 (1999)

    Google Scholar 

  3. Li, C., Chang, E., Garcia-Molina, H., Wiederhold, G.: Clustering for Approximate Similarity Search in High-Dimensional Spaces. IEEE Transactions on Knowledge and Data Engineering 14(4), 792–808 (2002)

    Article  Google Scholar 

  4. Yu, D., Zhang, A.: ClusterTree: Integration of Cluster Representation and Nearest-Neighbor Search for Large Data Sets with High Dimensions. IEEE Transactions on Knowledge and Data Engineering 15(5), 1316–1337 (2003)

    Article  Google Scholar 

  5. Xu, H., Yu, D., Xu, D., Zhang, A.: SS-ClusterTree: A Subspace Clustering Based Indexing Algorithm over High-Dimensional Image Features. In: Proceedings of CIVR 2008, New York, NY, USA, pp. 95–104 (2008)

    Google Scholar 

  6. Tao, W., Jin, H., Luo, F., Wu, K.: Integrating Image Clustering and Memory Indexing for Large Scale Content-based Image Retrieval. In: MIPPR 2009. Proceedings of SPIE, vol. 7498 (2009)

    Google Scholar 

  7. Cui, B., Ooi, B.C., Su, J., Tan, K.L.: Contorting High Dimensional Data for Efficient Main Memory KNN Processing. In: Proceedings of SIGMOD 2003, pp. 479–490 (2003)

    Google Scholar 

  8. Wang, B., Gan, J.Q.: Integration of Projected Clusters and Principal Axis Trees for High-Dimensional Data Indexing and Query. In: Yang, Z.R., Yin, H., Everson, R.M. (eds.) IDEAL 2004. LNCS, vol. 3177, pp. 191–196. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  9. Hansen, M.H., Yu, B.: Model Selection and the Principle of Minimum Description Length. J. Amer. Statist. Assoc. 96(454), 746–774 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  10. Vlassis, N., Likas, A.: A Kurtosis-Based Dynamic Approach to Gaussian Mixture Modeling. IEEE Trans. Sys. Man Cybern. 29, 393–399 (1999)

    Article  Google Scholar 

  11. Deng, Y., Liu, X.: Combined Similarity Measure Based Approach to Image Retrieval. Journal of Information & Computational Science 5(1), 345–350 (2008)

    Google Scholar 

  12. James, Z.W., Jia, L., Gio, W.: SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture Libraries. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(9), 947–963 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wan, Y. et al. (2012). GMM-ClusterForest: A Novel Indexing Approach for Multi-features Based Similarity Search in High-Dimensional Spaces. In: Huang, T., Zeng, Z., Li, C., Leung, C.S. (eds) Neural Information Processing. ICONIP 2012. Lecture Notes in Computer Science, vol 7664. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34481-7_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34481-7_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34480-0

  • Online ISBN: 978-3-642-34481-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics