Skip to main content
Log in

An improved clustering ensemble method based link analysis

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Clustering Ensemble aggregates several base clustering analyses into a consensus clustering result, which is more accurate, stable and meaningful than standard clustering algorithm. In this paper, the ensemble information is described by data cluster association matrix. However, most data cluster association matrix overlooks an important type of information about the relationship between clusters. This paper proposes a new method WETU to refine the data cluster association matrix with link-based similarity measure. The refined data cluster association matrix is obtained according to the similarity of clusters among all base clustering results, not in one base clustering result. In addition, WETU can provide more discriminative information than CSM and WTU. The data cluster association matrix is refined into high level real-valued matrix, which can be aggregated by real-valued method, such as Global k-means. Experiments on synthetic dataset and UCI datasets show that the proposed method outperforms standard K-means, base clustering algorithm and CSM+Global k-means and WTU+Global k-means.T

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Adamic, L.A., Adar, E.: Friends and neighbors on the Web. Soc. Networks 25(3), 211–230 (2003)

    Article  Google Scholar 

  2. Ayad, H., and Kamel, M.: “Finding Natural Clusters Using Multiclusterer Combiner Based on Shared Nearest Neighbors,” Proc. Int’l Work. Mult. Classif. Syst., 166–175 (2003)

  3. Borges, J., Levene, M.: Ranking pages by topology and popularity within Web sites. World Wide Web 9, 301–316 (2006)

    Article  Google Scholar 

  4. Domeniconi, C., Al-Razgan, M.: Weighted Cluster Ensembles: Methods and Analysis. ACM Trans. Knowl. Discov. Data 2(4), 1–40 (2009)

    Article  Google Scholar 

  5. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification. John Wiley & Sons, New York (2001)

    MATH  Google Scholar 

  6. Fern, X.Z., Brodley, C.E.: “Random projection for high dimensional clustering: A cluster ensemble approach,” Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington DC, 186–193 (2003)

  7. Fischer, B., Buhmann, J.M.: Bagging for path-based clustering. IEEE Trans. Pattern Anal. Mach. Intell. 25(11), 1411–1415 (2003)

    Google Scholar 

  8. Fouss, F., Pirotte, A., Renders, J.M., Saerens, M.: Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation. EEE Trans. Knowl. Data Eng. 19(3), 355–369 (2007)

    Article  Google Scholar 

  9. Fred, A.L.N., Jain, A.K.: Combining multiple clusterings using evidence accumulation. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 835–850 (2005)

    Google Scholar 

  10. Getoor, L., Diehl, C.P.: Link mining: a survey. ACM SIGKDD Explor. Newsl. 7(2), 3–12 (2005)

    Article  Google Scholar 

  11. Gionis, A., Mannila, H. and Tsaparas, P.: “Clustering Aggregation,” Proc. Int’l Conf. Data Eng., 341–352 (2005)

  12. Iam-On, N., Boongoen, T., Garrett, S., Price, C.: A link-based approach to the cluster ensemble problem. IEEE Trans. Pattern Anal. Mach. Intell. 33(12), 2396–2409 (2011)

    Article  Google Scholar 

  13. Jain, A.K., Law, M.H.C.: Data clustering: A user’s dilemma”, Pattern Recognition and Machine Intelligence, pp. 1–10. Springer-Verlag, Berlin (2005)

    Book  Google Scholar 

  14. Jain, A., Murty, M., Flynn, P.: Data clustering: a review. ACM Comput. Surv. 31, 264–323 (1999)

    Article  Google Scholar 

  15. Karypis, G., Kumar, V.: Multilevel k-Way Partitioning Scheme for Irregular Graphs. J. Parallel Distrib. Comput. 48(1), 96–129 (1998)

    Article  MathSciNet  Google Scholar 

  16. Kellam, P., Liu, X., Martin, N.J., Orengo, C., Swift, S. and Tucker, A.: “Comparing, contrasting and combining clusters in viral gene expression data,” in Proc. 6th Workshop Intell. Data Anal. Med. Pharmocol., 56–62 (2001)

  17. Kuncheva, L.I., Vetrov, D.P.: Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE Trans. Pattern Anal. Mach. Intell. 28(11), 1798–1808 (2006)

    Article  Google Scholar 

  18. Li, J.Q., Zhao, Y., Garcia-Molina, H.: A path-based approach for web page retrieval. World Wide Web 15, 257–283 (2012)

    Article  Google Scholar 

  19. Likas, A., Vlassis, N., Verbeek, J.J.: The Global k-Means Clustering Algorithm. Pattern Recognit. 36, 451–461 (2003)

    Article  Google Scholar 

  20. Lin, Z., King, I. and Lyu, M.R.: “PageSim: A Novel Link-Based Similarity Measure for the World Wide Web,”Proc. IEEE/WIC/ACM Int’l Conf. Web Intell., 687–693 (2006)

  21. Minaei-Bidgoli, B. Topchy, A. and Punch, W.: “A Comparison of Resampling Methods for Clustering Ensembles,” Proc. Int’l Conf. Mach. Learn. Models Technol. Appl., 939–945 (2004)

  22. Monti, S., Tamayo, P., Mesirov, J.P., Golub, T.R.: Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach. Learn. 52, 91–118 (2003)

    Google Scholar 

  23. Natthakan Iam-On, Tossapon Boongoen, Improved Link-Based Cluster Ensembles,WCCI 2012 IEEE World Congress on Computational Intelligence. Brisbane(2012)

  24. Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: analysis and an algorithm. Adv. Neural Inf. Process. Syst 14, 849–856 (2001)

    Google Scholar 

  25. Nguyen, N. and Caruana, R.: “Consensus Clusterings,” Proc. IEEE Int’l Conf. Data Min., 607–612 (2007)

  26. Punera, K., Ghosh, J.: Soft cluster ensembles. In: de Oliveira Valente, J., Pedrycz, W. (eds.) Advances in fuzzy clustering and its applications. Wiley, Hoboken (2007)

    Google Scholar 

  27. Strehl, A., Ghosh, J.: Cluster Ensembles: a Knowledge Reuse Framework for Combining Multiple Partitions. J. Mach. Learn. Res. 3, 583–617 (2002)

    MathSciNet  Google Scholar 

  28. Topchy, A., Jain, A.K., Punch, W.: Clustering ensembles: models of consensus and weak partitions. IEEE Trans. Pattern Anal. Mach. Intell. 27(12), 1866–1881 (2005)

    Article  Google Scholar 

  29. Wang, T.: CA-Tree: a Hierarchical Structure for Efficient and Scalable Coassociation-Based Cluster Ensembles. IEEE Trans. Syst. Man Cybern.—PART B: Cybern. 41(3), 686–698 (2011)

    Article  Google Scholar 

  30. Wei, F., Qian, W., Wang, C., Zhou, A.: Detecting overlapping community structures in networks. World Wide Web 12, 235–261 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li-Juan Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hao, ZF., Wang, LJ., Cai, RC. et al. An improved clustering ensemble method based link analysis. World Wide Web 18, 185–195 (2015). https://doi.org/10.1007/s11280-013-0208-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-013-0208-6

Keywords

Navigation