Pairwise Similarity for Cluster Ensemble Problem: Link-Based and Approximate Approaches

Iam-On, Natthakan; Boongoen, Tossapon

doi:10.1007/978-3-642-40069-8_5

Natthakan Iam-On¹⁸ &
Tossapon Boongoen¹⁹

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 7980))

429 Accesses
1 Citations

Abstract

Cluster ensemble methods have emerged as powerful techniques, aggregating several input data clusterings to generate a single output clustering, with improved robustness and stability. In particular, link-based similarity techniques have recently been introduced with superior performance to the conventional co-association method. Their potential and applicability are, however limited due to the underlying time complexity. In light of such shortcoming, this paper presents two approximate approaches that mitigate the problem of time complexity: the approximate algorithm approach (Approximate SimRank Based Similarity matrix) and the approximate data approach (Prototype-based cluster ensemble model). The first approach involves decreasing the computational requirement of the existing link-based technique; the second reduces the size of the problem by finding a smaller, representative, approximate dataset, derived by a density-biased sampling technique. The advantages of both approximate approaches are empirically demonstrated over 22 datasets (both artificial and real data) and statistical comparisons of performance (with 95% confidence level) with three well-known validity criteria. Results obtained from these experiments suggest that approximate techniques can efficiently help scaling up the application of link-based similarity methods to wider range of data sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Appel, A.P., Paterlini, A.A., de Sousa, E.P.M., Traina, A.J.M., Traina Jr., C.: A density-biased sampling technique to improve cluster representativeness. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 366–373. Springer, Heidelberg (2007)
Chapter Google Scholar
Asuncion, A., Newman, D.J.: UCI machine learning repository (2007)
Google Scholar
Boulis, C., Ostendorf, M.: Combining multiple clustering systems. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 63–74. Springer, Heidelberg (2004)
Chapter Google Scholar
Calado, P., Cristo, M., Gonçalves, M.A., de Moura, E.S., Ribeiro-Neto, B.A., Ziviani, N.: Link-based similarity measures for the classification of web documents. JASIST 57(2), 208–221 (2006)
Article Google Scholar
de Castro, L.N.: Immune Engineering: Development of Computational Tools Inspired by the Artificial Immune Systems. Ph.D. thesis, DCA - FEEC/UNICAMP, Campinas/SP, Brazil (2001)
Google Scholar
Domeniconi, C., Al-Razgan, M.: Weighted cluster ensembles: Methods and analysis. ACM Transactions on Knowledge Discovery from Data 2(4), 1–40 (2009)
Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience (November 2000)
Google Scholar
Fern, X.Z., Brodley, C.E.: Random projection for high dimensional data clustering: A cluster ensemble approach. In: Proceedings of International Conference on Machine Learning, pp. 186–193 (2003)
Google Scholar
Fern, X.Z., Brodley, C.E.: Solving cluster ensemble problems by bipartite graph partitioning. In: Proceedings of International Conference on Machine Learning, pp. 36–43 (2004)
Google Scholar
Fred, A.: Finding consistent clusters in data partitions. In: Kittler, J., Roli, F. (eds.) MCS 2001. LNCS, vol. 2096, pp. 309–318. Springer, Heidelberg (2001)
Chapter Google Scholar
Fred, A.L.N., Jain, A.K.: Data clustering using evidence accumulation. In: International Conference on Pattern Recognition, pp. 276–280 (2002)
Google Scholar
Fred, A.L.N., Jain, A.K.: Robust data clustering. In: International Conference on Pattern Recognition, pp. 128–136 (2003)
Google Scholar
Fred, A.L.N., Jain, A.K.: Combining multiple clusterings using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(6), 835–850 (2005)
Article Google Scholar
Fred, A.L.N., Jain, A.K.: Learning pairwise similarity for data clustering. In: International Conference on Pattern Recognition, pp. 925–928 (2006)
Google Scholar
Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. In: Proceedings of International Conference on Data Engineering, pp. 341–352 (2005)
Google Scholar
Iam-on, N., Boongoen, T., Garrett, S.: Refining pairwise similarity matrix for cluster ensemble problem with cluster relations. In: Boulicaut, J.-F., Berthold, M.R., Horváth, T. (eds.) DS 2008. LNCS (LNAI), vol. 5255, pp. 222–233. Springer, Heidelberg (2008)
Chapter Google Scholar
Jain, A.K., Law, M.H.C.: Data clustering: A user’s dilemma. In: Pal, S.K., Bandyopadhyay, S., Biswas, S. (eds.) PReMI 2005. LNCS, vol. 3776, pp. 1–10. Springer, Heidelberg (2005)
Chapter Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Computing Survey 31(3), 264–323 (1999)
Article Google Scholar
Jeh, G., Widom, J.: Simrank: A measure of structural-context similarity. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 538–543 (2002)
Google Scholar
Karypis, G., Aggarwal, R., Kumar, V., Shekhar, S.: Multilevel hypergraph partitioning: applications in VLSI domain. IEEE Transactions on VLSI Systems 7(1), 69–79 (1999)
Article Google Scholar
Karypis, G., Kumar, V.: Multilevel k-way partitioning scheme for irregular graphs. Journal of Parallel Distributed Computing 48(1), 96–129 (1998)
Article MathSciNet Google Scholar
Kerdprasop, K., Kerdprasop, N., Sattayatham, P.: Density-biased clustering based on reservoir sampling. In: Proceedings of DEXA Workshops, pp. 1122–1126 (2005)
Google Scholar
Klink, S., Reuther, P., Weber, A., Walter, B., Ley, M.: Analysing social networks within bibliographical data. In: Bressan, S., Küng, J., Wagner, R. (eds.) DEXA 2006. LNCS, vol. 4080, pp. 234–243. Springer, Heidelberg (2006)
Chapter Google Scholar
Kollios, G., Gunopulos, D., Koudas, N., Berchtold, S.: Efficient biased sampling for approximate clustering and outlier detection in large data sets. IEEE Transactions on Knowledge and Data Engineering 15(5), 1170–1187 (2003)
Article Google Scholar
Kuncheva, L.I., Hadjitodorov, S.T.: Using diversity in cluster ensembles. In: Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, pp. 1214–1219 (2004)
Google Scholar
Kuncheva, L.I., Vetrov, D.: Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(11), 1798–1808 (2006)
Article Google Scholar
Kyrgyzov, I.O., Maitre, H., Campedel, M.: A method of clustering combination applied to satellite image analysis. In: Proceedings of International Conference on Image Analysis and Processing, pp. 81–86 (2007)
Google Scholar
Monti, S., Tamayo, P., Mesirov, J.P., Golub, T.R.: Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning 52(1-2), 91–118 (2003)
Article MATH Google Scholar
Nguyen, N., Caruana, R.: Consensus clusterings. In: Proceedings of IEEE International Conference on Data Mining, pp. 607–612 (2007)
Google Scholar
Palmer, C.R., Faloutsos, C.: Density biased sampling: an improved method for data mining and clustering. SIGMOD Records 29(2), 82–92 (2000)
Article Google Scholar
Rand, W.M.: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66, 846–850 (1971)
Article Google Scholar
Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3, 583–617 (2002)
MathSciNet Google Scholar
Swift, S., Tucker, A., Vinciotti, V., Martin, N., Orengo, C., Liu, X., Kellam, P.: Consensus clustering and functional interpretation of gene-expression data. Genome Biology 5, R94 (2004)
Google Scholar
Topchy, A.P., Jain, A.K., Punch, W.F.: Combining multiple weak clusterings. In: Proceedings of IEEE International Conference on Data Mining, pp. 331–338 (2003)
Google Scholar
Topchy, A.P., Jain, A.K., Punch, W.F.: A mixture model for clustering ensembles. In: Proceedings of SIAM International Conference on Data Mining, pp. 379–390 (2004)
Google Scholar
Wolpert, D.H., Macready, W.G.: No free lunch theorems for search. Technical Report SFI-TR-95-02-010, Santa Fe Institute (1995)
Google Scholar
Xue, H., Chen, S., Yang, Q.: Discriminatively regularized least-squares classification. Pattern Recognition 42(1), 93–104 (2009)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Technology, Mae Fah Luang University, Chiang Rai, 57100, Thailand
Natthakan Iam-On
Department of Mathematics and Computer Science, Royal Thai Air Force Academy, Bangkok, 10220, Thailand
Tossapon Boongoen

Authors

Natthakan Iam-On
View author publications
You can also search for this author in PubMed Google Scholar
Tossapon Boongoen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IRIT, Paul Sabatier University, 118, route de Narbonne,, 31062, Toulouse Cedex, France
Abdelkader Hameurlain
FAW, University of Linz, Altenbergerstraße 69, 4040, Linz, Austria
Josef Küng & Roland Wagner &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Iam-On, N., Boongoen, T. (2013). Pairwise Similarity for Cluster Ensemble Problem: Link-Based and Approximate Approaches. In: Hameurlain, A., Küng, J., Wagner, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems IX. Lecture Notes in Computer Science, vol 7980. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40069-8_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-40069-8_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40068-1
Online ISBN: 978-3-642-40069-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics