Skip to main content

Ensemble Clustering of High Dimensional Data with FastMap Projection

  • Conference paper
  • First Online:
Trends and Applications in Knowledge Discovery and Data Mining (PAKDD 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8643))

Included in the following conference series:

Abstract

In this paper, we propose an ensemble clustering method for high dimensional data which uses FastMap projection to generate subspace component data sets. In comparison with popular random sampling and random projection, FastMap projection preserves the clustering structure of the original data in the component data sets so that the performance of ensemble clustering is improved significantly. We present two methods to measure preservation of clustering structure of generated component data sets. The comparison results have shown that FastMap preserved the clustering structure better than random sampling and random projection. Experiments on three real data sets were conducted with three data generation methods and three consensus functions. The results have shown that the ensemble clustering with FastMap projection outperformed the ensemble clusterings with random sampling and random projection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Law, M., Topchy, A., Jain, A.: Multiobjective data clustering. In: Proceedings of CVPR, pp. 424–430 (2004)

    Google Scholar 

  2. Chávez, E., Navarro, G.: A probabilistic spell for the curse of dimensionality. In: Buchsbaum, A.L., Snoeyink, J. (eds.) ALENEX 2001. LNCS, vol. 2153, pp. 147–160. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  3. Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 245–250 (2001)

    Google Scholar 

  4. Fern, X.Z., Brodley, C.E.: Random projection for high dimensional data clustering: a cluster ensemble approach. In: International Conference on Machine Learning (2003)

    Google Scholar 

  5. Faloutsos, C., Lin, K.: Fastmap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In: Proceedings of ACM-SIGMOD, pp. 163–174 (1995)

    Google Scholar 

  6. Xue, H., Chen, S., Yang, Q.: Discriminatively regularized least-squares classification. Pattern Recognit. 42, 93–104 (2009)

    Article  MATH  Google Scholar 

  7. Dasgupta, S.: Experiments with random projection. In: Uncertainty in Artificial Intelligence: Proceedings of the Sixteenth Conference, pp. 143–151 (2000)

    Google Scholar 

  8. Domeniconi, C., Al-Razgan, M.: Weighted cluster ensembles: methods and analysis. ACM Trans. KDD 2, 1–40 (2009)

    Google Scholar 

  9. Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20, 359–392 (1998)

    Article  MathSciNet  Google Scholar 

  10. Kriegel, H., Kroger, P., Zimek, A.: Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering and correlation clustering. ACM Trans. KDD 3, 1–58 (2009)

    Google Scholar 

  11. Aswani Kumar, C.: Reducing data dimensionality using random projections and fuzzy k-means clustering. Int. J. Intell. Comput. Cybern. 4, 353–365 (2011)

    Article  MathSciNet  Google Scholar 

  12. Karypis, G., Aggarwal, R., Kumar, V., Shekhar, S.: Multilevel hypergraph partitioning: applications in vlsi domain. In: Proceedings of Conference on Design and Automation (1997)

    Google Scholar 

  13. Kuncheva, L., Vetrov, D.: Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE Trans. PAMI 28, 1798–1808 (2006)

    Article  Google Scholar 

  14. Kuncheva, L.I., Hadjitodorov, S.T.: Using diversity in cluster ensembles. In: IEEE International Conference on Systems, pp. 1214–1219 (2004)

    Google Scholar 

  15. Strehl, A., Ghosh, J.: Cluster ensembles: a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)

    MathSciNet  Google Scholar 

  16. Zhang, L., Mahdavi, M., Jin, R., Yang, T., Zhu, S.: Recovering optimal solution by dual random projection. In: JMLR: Workshop and Conference Proceedings, vol. 30, pp. 1–23 (2012)

    Google Scholar 

Download references

Acknowledgment

This research is supported by Shenzhen New Industry Development Fund under Grant No. JC201005270342A.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Imran Khan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Khan, I., Huang, J.Z., Tung, N.T., Williams, G. (2014). Ensemble Clustering of High Dimensional Data with FastMap Projection. In: Peng, WC., et al. Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2014. Lecture Notes in Computer Science(), vol 8643. Springer, Cham. https://doi.org/10.1007/978-3-319-13186-3_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13186-3_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13185-6

  • Online ISBN: 978-3-319-13186-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics