Abstract
Clustering Ensemble aggregates several base clustering analyses into a consensus clustering result, which is more accurate, stable and meaningful than standard clustering algorithm. In this paper, the ensemble information is described by data cluster association matrix. However, most data cluster association matrix overlooks an important type of information about the relationship between clusters. This paper proposes a new method WETU to refine the data cluster association matrix with link-based similarity measure. The refined data cluster association matrix is obtained according to the similarity of clusters among all base clustering results, not in one base clustering result. In addition, WETU can provide more discriminative information than CSM and WTU. The data cluster association matrix is refined into high level real-valued matrix, which can be aggregated by real-valued method, such as Global k-means. Experiments on synthetic dataset and UCI datasets show that the proposed method outperforms standard K-means, base clustering algorithm and CSM+Global k-means and WTU+Global k-means.T
Similar content being viewed by others
References
Adamic, L.A., Adar, E.: Friends and neighbors on the Web. Soc. Networks 25(3), 211–230 (2003)
Ayad, H., and Kamel, M.: “Finding Natural Clusters Using Multiclusterer Combiner Based on Shared Nearest Neighbors,” Proc. Int’l Work. Mult. Classif. Syst., 166–175 (2003)
Borges, J., Levene, M.: Ranking pages by topology and popularity within Web sites. World Wide Web 9, 301–316 (2006)
Domeniconi, C., Al-Razgan, M.: Weighted Cluster Ensembles: Methods and Analysis. ACM Trans. Knowl. Discov. Data 2(4), 1–40 (2009)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification. John Wiley & Sons, New York (2001)
Fern, X.Z., Brodley, C.E.: “Random projection for high dimensional clustering: A cluster ensemble approach,” Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington DC, 186–193 (2003)
Fischer, B., Buhmann, J.M.: Bagging for path-based clustering. IEEE Trans. Pattern Anal. Mach. Intell. 25(11), 1411–1415 (2003)
Fouss, F., Pirotte, A., Renders, J.M., Saerens, M.: Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation. EEE Trans. Knowl. Data Eng. 19(3), 355–369 (2007)
Fred, A.L.N., Jain, A.K.: Combining multiple clusterings using evidence accumulation. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 835–850 (2005)
Getoor, L., Diehl, C.P.: Link mining: a survey. ACM SIGKDD Explor. Newsl. 7(2), 3–12 (2005)
Gionis, A., Mannila, H. and Tsaparas, P.: “Clustering Aggregation,” Proc. Int’l Conf. Data Eng., 341–352 (2005)
Iam-On, N., Boongoen, T., Garrett, S., Price, C.: A link-based approach to the cluster ensemble problem. IEEE Trans. Pattern Anal. Mach. Intell. 33(12), 2396–2409 (2011)
Jain, A.K., Law, M.H.C.: Data clustering: A user’s dilemma”, Pattern Recognition and Machine Intelligence, pp. 1–10. Springer-Verlag, Berlin (2005)
Jain, A., Murty, M., Flynn, P.: Data clustering: a review. ACM Comput. Surv. 31, 264–323 (1999)
Karypis, G., Kumar, V.: Multilevel k-Way Partitioning Scheme for Irregular Graphs. J. Parallel Distrib. Comput. 48(1), 96–129 (1998)
Kellam, P., Liu, X., Martin, N.J., Orengo, C., Swift, S. and Tucker, A.: “Comparing, contrasting and combining clusters in viral gene expression data,” in Proc. 6th Workshop Intell. Data Anal. Med. Pharmocol., 56–62 (2001)
Kuncheva, L.I., Vetrov, D.P.: Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE Trans. Pattern Anal. Mach. Intell. 28(11), 1798–1808 (2006)
Li, J.Q., Zhao, Y., Garcia-Molina, H.: A path-based approach for web page retrieval. World Wide Web 15, 257–283 (2012)
Likas, A., Vlassis, N., Verbeek, J.J.: The Global k-Means Clustering Algorithm. Pattern Recognit. 36, 451–461 (2003)
Lin, Z., King, I. and Lyu, M.R.: “PageSim: A Novel Link-Based Similarity Measure for the World Wide Web,”Proc. IEEE/WIC/ACM Int’l Conf. Web Intell., 687–693 (2006)
Minaei-Bidgoli, B. Topchy, A. and Punch, W.: “A Comparison of Resampling Methods for Clustering Ensembles,” Proc. Int’l Conf. Mach. Learn. Models Technol. Appl., 939–945 (2004)
Monti, S., Tamayo, P., Mesirov, J.P., Golub, T.R.: Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach. Learn. 52, 91–118 (2003)
Natthakan Iam-On, Tossapon Boongoen, Improved Link-Based Cluster Ensembles,WCCI 2012 IEEE World Congress on Computational Intelligence. Brisbane(2012)
Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: analysis and an algorithm. Adv. Neural Inf. Process. Syst 14, 849–856 (2001)
Nguyen, N. and Caruana, R.: “Consensus Clusterings,” Proc. IEEE Int’l Conf. Data Min., 607–612 (2007)
Punera, K., Ghosh, J.: Soft cluster ensembles. In: de Oliveira Valente, J., Pedrycz, W. (eds.) Advances in fuzzy clustering and its applications. Wiley, Hoboken (2007)
Strehl, A., Ghosh, J.: Cluster Ensembles: a Knowledge Reuse Framework for Combining Multiple Partitions. J. Mach. Learn. Res. 3, 583–617 (2002)
Topchy, A., Jain, A.K., Punch, W.: Clustering ensembles: models of consensus and weak partitions. IEEE Trans. Pattern Anal. Mach. Intell. 27(12), 1866–1881 (2005)
Wang, T.: CA-Tree: a Hierarchical Structure for Efficient and Scalable Coassociation-Based Cluster Ensembles. IEEE Trans. Syst. Man Cybern.—PART B: Cybern. 41(3), 686–698 (2011)
Wei, F., Qian, W., Wang, C., Zhou, A.: Detecting overlapping community structures in networks. World Wide Web 12, 235–261 (2009)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hao, ZF., Wang, LJ., Cai, RC. et al. An improved clustering ensemble method based link analysis. World Wide Web 18, 185–195 (2015). https://doi.org/10.1007/s11280-013-0208-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-013-0208-6