Abstract
The star-structured high-order heterogeneous data is ubiquitous, such data represent objects of a certain type, connected to other types of data, or the features, so that the overall data schema forms a star-structure of inter-relationships. In this paper, we study the problem of co-clustering of star-structured high-order heterogeneous data. We present a new solution, a Hierarchical High-order Co-clustering Algorithm by Maximizing Modularity, MHCoC, which iteratively optimizes the objective function based on modularity and finally converges to a unique clustering result. In contrast to the traditional co-clustering methods, MHCoC merges information of multiple feature spaces of high-order heterogeneous data. Moreover, MHCoC takes a top-down strategy to perform a greedy divisive procedure, generating a tree-like hierarchical clustering result that reveal the relationship between clusters. To illustrate the process in more detail, we design a toy example to describe how MHCoC selects the appropriate co-cluster and splits it. Extensive experiments on real-world datasets demonstrate the effectiveness of the proposed method.
Similar content being viewed by others
References
Ailem M, Role F, Nadif M Co-clustering document-term matrices by direct maximization of graph modularity. In: Proc. 24th ACM Int. Conf. Inf. Knowl. Manage, 2015 pp 1807–1s810.
Wang Y, Feng C, Guo C, Chu Y, Hwang J (2019) Solving the sparsity problem in recommendations via cross-domain item embedding based on co-clustering. In: Proc. 12th ACM Int. Conf. Web Search Data Mining, 717–725.
Feng L, Zhao Q, Zhou C (2020) Improving performances of Top-N recommendations with co-clustering method, Expert Syst. Appl., 143.
Chen X, Huang JZ, Wu Q, Yang M (2019) Subspace weighting co-clustering of gene expression data. IEEE/ACM Trans Comput Biol Bioinform 16(2):352–364
Hussain SF, Iqbal S (2018) CCGA: Co-similarity based Co-clustering using genetic algorithm. Appl Soft Comput 72:30–42
Keuper M, Tang S, Andres B, Brox T, Schiele B (2020) Motion segmentation & multiple object tracking by correlation co-clustering. IEEE Trans Pattern Anal Mach Intell 42(1):140–153
Meng L, Tan A, Xu D (2014) Semi-supervised heterogeneous fusion for multimedia data co-clustering. IEEE Trans Knowl Data Eng 26(9):2293–2306
Cheng W, Zhang X, Pan F, Wang W (2016) HICC: an entropy splitting-based framework for hierarchical co-clustering. Knowl Inf Syst 46(2):343–367
Ienco D, Robardet C, Pensa RG, Meo R (2013) Parameter-less co-clustering for star-structured heterogeneous data. Data Min Knowl Discov 26(2):217–254
Yin M, Gao J, Xie S, Guo Y (2019) Multiview subspace clustering via tensorial t-product representation. IEEE Trans Neural Netw Learning Syst 30(3):851–864
Huang L, Chao H, Wang C (2019) Multi-view intact space clustering. Pattern Recogn 86:344–353
Yin M, Gao J, Xie S, Guo Y (2020) Auto-weighted multi-view co-clustering with bipartite graphs. Inf Sci 512:18–30
Gao B, Liu T, Zheng X, Cheng Q, Ma W (2005) Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering. In: Proc. 11th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pp 41–50.
Gao B, Liu T, Ma W (2006) Star-structured high-order heterogeneous data co-clustering based on consistent information theory, In: Proc. 6th ACM Int. Conf. Data Mining, pp 880–884.
Chen Y, Wang L, Dong M (2009) Non-negative matrix factorization for semi-supervised heterogeneous data co-clustering. IEEE Trans Knowl Data Eng 22(10):1459–1474
Wang S, Guo W (2017) Robust co-clustering via dual local learning and high-order matrix factorization, Knowl.-Based Syst., 138:176–17.
Xu D, Cheng W, Zong B, Ni J, Song D, Yu W, Chen Y, Chen H, Zhang X (2019) Deep Co-Clustering. In: Proc. SIAM Int. Conf. Data Mining, pp 414–422.
Papalexakis EE, Sidiropoulos ND, Bro R (2013) From K -means to higher-way co-clustering: multilinear decomposition with sparse latent factors. IEEE Trans. Signal Processing 61(2):493–506
Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: Proc. 7th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pp 269–274.
Li T (2005) A general model for clustering binary data, In: Proc. 11th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pp 188–197.
Labiod L, Nadif M (2011) Co-clustering for binary and categorical data with maximum modularity. In: Proc. 11th ACM Int. Conf. Data Mining, pp 1040–1045.
Han H, Dong X, Zuo C (2020) A weighted recommendation algorithm based on multiview clustering of user. J Intell Fuzzy Syst 38(1):441–451
Li J, Wang C, Li P, Lai J (2018) Discriminative metric learning for multi-view graph partitioning. Pattern Recogn 75:199–213
Kim Y, Amini M, Goutte C, Gallinari P (2010) Multi-view clustering of multilingual documents, In: Proc. Int. ACM SIGIR Conf. Res. Develop. Inf. Retr., pp 821–822.
Zhang M, Yang Y, Shen F, Zhang H, Wang Y (2017) Multi-view feature selection and classification for Alzheimer’s disease diagnosis. Multimedia Tools Appl 76(8):10761–10775
Zhan K, Chang X, Guan J, Chen L, Ma Z, Yang Y (2019) Adaptive structure discovery for multimedia analysis using multiple features. IEEE Trans Cybernetics 49(5):1826–1834
Gao B, Liu T, Feng G, Qin T, Cheng Q, Ma W (2005) Hierarchical taxonomy preparation for text categorization using consistent bipartite spectral graph co-partitioning. IEEE Trans Knowl Data Eng 19(7):1263–1273
Greco G, Guzzo A, Pontieri L (2010) Coclustering multiple heterogeneous domains: linear combinations and agreements. IEEE Trans Knowl Data Eng 22(12):1649–1663
Sun Y, Yu Y, Han J (2009) Ranking-based clustering of heterogeneous information networks with star network schema, In: Proc. 15th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pp 797–806.
Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13(8):841–847
Chen Y, Wang L, Dong M (2010) Non-negative matrix factorization for semi-supervised heterogeneous data co-clustering. IEEE Trans Knowl Data Eng 22(10):1459–1474
Cohen-Addad V, Kanade V, Mallmann-Trenn F, Mathieu C (2019) Hierarchical clustering: objective functions and algorithms, J ACM 66(4):26:1–26:42.
Charikar M, Chatziafratis V, Niazadeh R (2019) Hierarchical clustering better than average-linkage. In: Proc. 15th ACM-SIAM Symp. on Dis. Algor., pp 2291–2304.
Emmendorfer LR, Canuto AM (2021) A generalized average linkage criterion for hierarchical agglomerative clustering. Appl Soft Comput 100:106990
Shi P, Zhao Z, Zhong H, Zhong H, Shen H, Ding L (2021) An improved agglomerative hierarchical clustering anomaly detection method for scientific data. Concurrency and computation, Practice and Experience, 33(6):e6077
Diez I, Bonifazi P, Escudero I, Mateos B, Muñoz MA, Stramaglia S, Cortes JM (2015) A novel brain partition highlights the modular skeleton shared by structure and function. Sci Reports 5:10532
Hu M, Zeng K, Wang Y et al (2021) Threshold-based hierarchical clustering for person re-identification. Entropy 23(5):522
Dhillon IS, Mallela S, Modha DS (2003) Information-theoretic co-clustering, In: Proc. 9th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pp 89–98.
Tan Q, Yang P, He J (2018) Feature co-shrinking for co-clustering. Pattern Recogn 77:12–19
Wang H, Nie F, Huang H, Ding CHQ (2011) Nonnegative matrix tri-factorization based high-order co-clustering and its fast implementation. In: Proc. 11th ACM Int. Conf. Data Mining, 2011, pp 774–783.
Slonim N, Tishby N (2000) Document clustering using word clusters via the information bottleneck method, In: Proc. 23rd Int. ACM SIGIR Conf. Res. Develop. Inf. Retr., pp 208–215.
Acknowledgements
The work is supported by the National Natural Science Foundation of China (No.61762078, 61363058, 61762079), Guangxi Key Laboratory of Trusted Software (No. kx201910) and Research Fund of Guangxi Key Lab of Multi-source Information Mining & Security (MIMS18-08)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
No conflicts of interests.
Rights and permissions
About this article
Cite this article
Wei, J., Ma, H., Liu, Y. et al. Hierarchical high-order co-clustering algorithm by maximizing modularity. Int. J. Mach. Learn. & Cyber. 12, 2887–2898 (2021). https://doi.org/10.1007/s13042-021-01375-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-021-01375-9