Abstract
In this paper, we propose an ensemble clustering method for high dimensional data which uses FastMap projection to generate subspace component data sets. In comparison with popular random sampling and random projection, FastMap projection preserves the clustering structure of the original data in the component data sets so that the performance of ensemble clustering is improved significantly. We present two methods to measure preservation of clustering structure of generated component data sets. The comparison results have shown that FastMap preserved the clustering structure better than random sampling and random projection. Experiments on three real data sets were conducted with three data generation methods and three consensus functions. The results have shown that the ensemble clustering with FastMap projection outperformed the ensemble clusterings with random sampling and random projection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Law, M., Topchy, A., Jain, A.: Multiobjective data clustering. In: Proceedings of CVPR, pp. 424–430 (2004)
Chávez, E., Navarro, G.: A probabilistic spell for the curse of dimensionality. In: Buchsbaum, A.L., Snoeyink, J. (eds.) ALENEX 2001. LNCS, vol. 2153, pp. 147–160. Springer, Heidelberg (2001)
Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 245–250 (2001)
Fern, X.Z., Brodley, C.E.: Random projection for high dimensional data clustering: a cluster ensemble approach. In: International Conference on Machine Learning (2003)
Faloutsos, C., Lin, K.: Fastmap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In: Proceedings of ACM-SIGMOD, pp. 163–174 (1995)
Xue, H., Chen, S., Yang, Q.: Discriminatively regularized least-squares classification. Pattern Recognit. 42, 93–104 (2009)
Dasgupta, S.: Experiments with random projection. In: Uncertainty in Artificial Intelligence: Proceedings of the Sixteenth Conference, pp. 143–151 (2000)
Domeniconi, C., Al-Razgan, M.: Weighted cluster ensembles: methods and analysis. ACM Trans. KDD 2, 1–40 (2009)
Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20, 359–392 (1998)
Kriegel, H., Kroger, P., Zimek, A.: Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering and correlation clustering. ACM Trans. KDD 3, 1–58 (2009)
Aswani Kumar, C.: Reducing data dimensionality using random projections and fuzzy k-means clustering. Int. J. Intell. Comput. Cybern. 4, 353–365 (2011)
Karypis, G., Aggarwal, R., Kumar, V., Shekhar, S.: Multilevel hypergraph partitioning: applications in vlsi domain. In: Proceedings of Conference on Design and Automation (1997)
Kuncheva, L., Vetrov, D.: Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE Trans. PAMI 28, 1798–1808 (2006)
Kuncheva, L.I., Hadjitodorov, S.T.: Using diversity in cluster ensembles. In: IEEE International Conference on Systems, pp. 1214–1219 (2004)
Strehl, A., Ghosh, J.: Cluster ensembles: a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)
Zhang, L., Mahdavi, M., Jin, R., Yang, T., Zhu, S.: Recovering optimal solution by dual random projection. In: JMLR: Workshop and Conference Proceedings, vol. 30, pp. 1–23 (2012)
Acknowledgment
This research is supported by Shenzhen New Industry Development Fund under Grant No. JC201005270342A.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Khan, I., Huang, J.Z., Tung, N.T., Williams, G. (2014). Ensemble Clustering of High Dimensional Data with FastMap Projection. In: Peng, WC., et al. Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2014. Lecture Notes in Computer Science(), vol 8643. Springer, Cham. https://doi.org/10.1007/978-3-319-13186-3_43
Download citation
DOI: https://doi.org/10.1007/978-3-319-13186-3_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13185-6
Online ISBN: 978-3-319-13186-3
eBook Packages: Computer ScienceComputer Science (R0)