A Sequential Three-Way Approach to Constructing a Co-association Matrix in Consensus Clustering

Hu, Mengjun; Deng, Xiaofei; Yao, Yiyu

doi:10.1007/978-3-319-99368-3_47

A Sequential Three-Way Approach to Constructing a Co-association Matrix in Consensus Clustering

Mengjun Hu¹⁷,
Xiaofei Deng¹⁷ &
Yiyu Yao¹⁷

Conference paper
First Online: 15 August 2018

1088 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11103))

Abstract

The main task in consensus clustering is to produce an optimal output clustering based on a set of input clusterings. The co-association matrix based consensus clustering methods are easy to understand and implement. However, they usually have high computational cost with big datasets, which restricts their applications. We propose a sequential three-way approach to constructing the co-association matrix progressively in multiple stages. In each stage, based on a set of input clusterings, we evaluate how likely two data points are associated and accordingly, divide a set of data-point pairs into three disjoint positive, negative and boundary regions. A data-point pair in the positive region is associated with a definite decision of clustering the two data points together. A pair in the negative region is associated with a definite decision of separating the two data points into different clusters. For a pair in the boundary region, we do not have sufficient information to make a definite decision. The decision on such a pair is deferred into the next stage where more input clusterings will be involved. By making quick decisions on early stages, the overall computational cost of constructing the matrix and the consensus clustering may be reduced.

This work is partially supported by a Discovery Grant from NSERC, Canada.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Chiu, D.S., Talhouk, A.: diceR: an R package for class discovery using an ensemble driven approach. BMC Bioinform. 19, 11–18 (2018)
Article Google Scholar
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. Royal Stat. Soc. Ser. B 39, 1–38 (1977)
MATH Google Scholar
Deng, X.F., Yao, Y.Y.: An information-theoretic interpretation of thresholds in probabilistic rough sets. In: Li, T., et al. (eds.) RSKT 2012. LNCS (LNAI), vol. 7414, pp. 369–378. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31900-6_46
Donath, W.E., Hoffman, A.J.: Algorithms for partitioning of graphs and computer logic based on eigenvectors of connection matrices. IBM Tech. Discl. Bull. 15, 938–944 (1972)
Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.W.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., et al. (eds.) KDD 1996, pp. 226–231. AAAI Press (1996)
Google Scholar
Fred, A.: Finding consistent clusters in data partitions. In: Kittler, J., Roli, F. (eds.) MCS 2001. LNCS, vol. 2096, pp. 309–318. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-48219-9_31
Chapter Google Scholar
Fred, A., Jain, A.K.: Combining multiple clustering using evidence accumulation. IEEE Trans. Pattern Anal. Mach. Intell. 27, 835–850 (2005)
Article Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7
Book MATH Google Scholar
Herbert, J.P., Yao, J.T.: Game-theoretic rough sets. Fundamenta Informaticae 108, 267–286 (2011)
MathSciNet MATH Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)
Article Google Scholar
Iam-on, N., Boongoen, T., Garrett, S.: LCE: a link-based cluster ensemble method for improved gene expression data analysis. Bioinformatics 26, 1513–1519 (2010)
Article Google Scholar
Iam-on, N., Boongoen, T., Garrett, S.: Refining pairwise similarity matrix for cluster ensemble problem with cluster relations. In: Jean-Fran, J.-F., Berthold, M.R., Horváth, T. (eds.) DS 2008. LNCS, vol. 5255, pp. 222–233. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88411-8_22
Chapter Google Scholar
Li, Y., Yu, J., Hao, P., Li, Z.: Clustering ensembles based on normalized edges. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS, vol. 4426, pp. 664–671. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71701-0_71
Chapter Google Scholar
Li, H.X., Zhang, L.B., Huang, B., Zhou, X.Z.: Sequential three-way decision and granulation for cost-sensitive face recognition. Knowl. Based Syst. 91, 241–251 (2016)
Article Google Scholar
Li, H.X., Zhang, L.B., Zhou, X.Z., Huang, B.: Cost-sensitive sequential three-way decision modeling using a deep neural network. Int. J. Approx. Reason. 85, 68–78 (2017)
Article MathSciNet Google Scholar
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297. University of California Press (1967)
Google Scholar
Meila, M.: Comparing clusterings - an information based distance. J. Multivar. Anal. 98, 873–895 (2007)
Article MathSciNet Google Scholar
Sokal, R., Michener, C.: A statistical method for evaluating systematic relationships. Univ. Kansas Sci. Bull. 38, 1409–1438 (1958)
Google Scholar
Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)
MathSciNet MATH Google Scholar
Vega-Pons, S., Ruiz-Shulcloper, J.: A survey of clustering ensemble algorithms. Int. J. Pattern Recogn. Artif. Intell. 25, 337–372 (2011)
Article MathSciNet Google Scholar
Vega-Pons, S., Ruiz-Shulcloper, J.: Clustering ensemble method for heterogeneous partitions. In: Bayro-Corrochano, E., Eklundh, J.-O. (eds.) CIARP 2009. LNCS, vol. 5856, pp. 481–488. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-10268-4_56
Chapter Google Scholar
Wang, X., Yang, C., Zhou, J.: Clustering aggregation by probability accumulation. Pattern Recogn. 42, 668–675 (2009)
Article Google Scholar
Yao, Y.Y.: An outline of a theory of three-way decisions. In: Yao, J.T., et al. (eds.) RSCTC 2012. LNCS, vol. 7413, pp. 1–17. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32115-3_1
Yao, Y.Y.: Probabilistic rough set approximations. Int. J. Approx. Reason. 49, 255–271 (2008)
Article Google Scholar
Yao, Y.Y., Deng, X.F.: Sequential three-way decisions with probabilistic rough sets. In: Wang, Y., et al. (eds.) ICCI-CC 2011, pp. 120–125 (2011)
Google Scholar
Yao, Y.Y., Hu, M., Deng, X.F.: Modes of sequential three-way classifications. In: Medina, J., Ojeda-Aciego, M., Verdegay, J.L., Pelta, D.A., Cabrera, I.P., Bouchon-Meunier, B., Yager, R.R. (eds.) IPMU 2018. CCIS, vol. 854, pp. 724–735. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91476-3_59
Yao, Y.Y., Lingras, P., Wang, R., Miao, D.: Interval set cluster analysis: a re-formulation. In: Sakai, H., Chakraborty, M.K., Hassanien, A.E., Ślęzak, D., Zhu, W. (eds.) RSFDGrC 2009. LNCS, vol. 5908, pp. 398–405. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-10646-0_48
Yu, H.: A framework of three-way cluster analysis. In: Polkowski, L., et al. (eds.) IJCRS 2017. LNCS, vol. 10314, pp. 300–312. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60840-2_22
Chapter Google Scholar
Yu, H., Wang, X., Wang, G.: A semi-supervised three-way clustering framework for multi-view data. In: Polkowski, L., et al. (eds.) IJCRS 2017. LNCS, vol. 10314, pp. 313–325. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60840-2_23
Chapter Google Scholar
Yu, H., Zhang, H.: A three-way decision clustering approach for high dimensional data. In: Flores, V., et al. (eds.) IJCRS 2016. LNCS, vol. 9920, pp. 229–239. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47160-0_21
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Regina, Regina, SK, S4S 0A2, Canada
Mengjun Hu, Xiaofei Deng & Yiyu Yao

Authors

Mengjun Hu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofei Deng
View author publications
You can also search for this author in PubMed Google Scholar
Yiyu Yao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mengjun Hu .

Editor information

Editors and Affiliations

University of Warsaw, Warsaw, Poland
Hung Son Nguyen
Faculty of Information Technology, Vietnam National University, Hanoi, Vietnam
Quang-Thuy Ha
School of Information Science, Southwest Jiaotong University, Chengdu, China
Tianrui Li
Institute of Computer Science, University of Silesia, Sosnowiec, Poland
Małgorzata Przybyła-Kasperek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, M., Deng, X., Yao, Y. (2018). A Sequential Three-Way Approach to Constructing a Co-association Matrix in Consensus Clustering. In: Nguyen, H., Ha, QT., Li, T., Przybyła-Kasperek, M. (eds) Rough Sets. IJCRS 2018. Lecture Notes in Computer Science(), vol 11103. Springer, Cham. https://doi.org/10.1007/978-3-319-99368-3_47

Download citation

DOI: https://doi.org/10.1007/978-3-319-99368-3_47
Published: 15 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99367-6
Online ISBN: 978-3-319-99368-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics