Abstract
Multi-view clustering has gained importance in recent times due to the large-scale generation of data, often from multiple sources. Multi-view clustering refers to clustering a set of objects which are expressed by multiple set of features, known as views, such as movies being expressed by the list of actors or by a textual summary of its plot. Co-clustering, on the other hand, refers to the simultaneous grouping of data samples and features under the assumption that samples exhibit a pattern only under a subset of features. This paper combines multi-view clustering with co-clustering and proposes a new Weighted Multi-View Co-Clustering (WMVCC) algorithm. The motivation behind the approach is to use the diversity of features provided by multiple sources of information while exploiting the power of co-clustering. The proposed method expands the clustering objective function to a unified co-clustering objective function across all the multiple views. The algorithm follows the k-means strategy and iteratively optimizes the clustering by updating cluster labels, features, and view weights. A local search is also employed to optimize the clustering result using weighted multi-step paths in a graph. Experiments are conducted on several benchmark datasets. The results show that the proposed approach converges quickly, and the clustering performance significantly outperforms other recent and state-of-the-art algorithms on sparse datasets.







Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Data availability
Publicly available data (references mentioned in text).
Notes
Links to datasets used are available here: https://sites.google.com/site/fawadsyed/datasets
References
Garcia-Dias R, Vieira S, Pinaya WHL, Mechelli A (2020) Clustering analysis. In machine learning (pp. 227-247). Academic press
Bisson G, Hussain F (2008) Chi-Sim: a new similarity measure for the co-clustering task. In 7th IEEE international conference on machine learning and applications (ICMLA), San Diego, USA. pp. 211–217
Jiang L, Cheng Y, Yang L, Li J, Yan H, Wang X (2019) A trust-based collaborative filtering algorithm for E-commerce recommendation system. J Ambient Intell Humaniz Comput 10(8):3023–3034
Ahmadian S, Joorabloo N, Jalili M, Ren Y, Meghdadi M, Afsharchi M (2020) A social recommender system based on reliable implicit relationships. Knowl-Based Syst 192:105371
Zhang X, Yang Y, Li T, Zhang Y, Wang H, Fujita H (2021) CMC: a consensus multi-view clustering model for predicting Alzheimer’s disease progression. Comput Methods Prog Biomed 199:105895
Xu YM, Wang CD, Lai JH (2016) Weighted multi-view clustering with feature selection. Pattern Recogn 53:25–35
Yang Y, Wang H (2018) Multi-view clustering: a survey. Big Data Mining and Analytics 1(2):83–107
Cai J, Luo J, Wang S, Yang S (2018) Feature selection in machine learning: a new perspective. Neurocomputing 300:70–79
Hussain SF, Babar HZUD, Khalil A, Jillani RM, Hanif M, Khurshid K (2020) A fast non-redundant feature selection technique for text data. IEEE Access 8:181763–181781
Xiao Q, Dai J, Luo J, Fujita H (2019) Multi-view manifold regularized learning-based method for prioritizing candidate disease miRNAs. Knowl-Based Syst 175:118–129
Hussain SF, Mushtaq M, Halim Z (2014) Multi-view document clustering via ensemble method. J Intell Inf Syst 43(1):81–99
Hussain SF, Bashir S (2016) Co-clustering of multi-view datasets. Knowl Inf Syst 47(3):545–570
Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recogn Lett 31(8):651–666
Forgey E (1965) Cluster analysis of multivariate data: efficiency vs. interpretability of classification. Biometrics 21(3):768–769
Hussain SF, Haris M (2019) A k-means based co-clustering (kCC) algorithm for sparse, high dimensional data. Expert Syst Appl 118:20–34
Yu SS, Chu SW, Wang CM, Chan YK, Chang TC (2018) Two improved k-means algorithms. Appl Soft Comput 68:747–755
Blömer J, Lammersen C, Schmidt M, Sohler C (2016) Theoretical analysis of the k-means algorithm–a survey. In algorithm engineering (pp. 81–116). Springer, Cham, Theoretical Analysis of the k-Means Algorithm – A Survey
Celebi ME, Kingravi HA, Vela PA (2013) A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst Appl 40(1):200–210
Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620
Blum A, Mitchell T (1998). Combining labeled and unlabeled data with co-training. In Proceedings of the 11th annual conference on Computational learning theory, pp. 92–100
Chao G, Sun S (2019) Semi-supervised multi-view maximum entropy discrimination with expectation Laplacian regularization. Information Fusion 45:296–306
Zhang Y, Yang Y, Li T, Fujita H (2019) A multitask multiview clustering algorithm in heterogeneous situations based on LLE and LE. Knowl-Based Syst 163:776–786
Sun J, Lu J, Xu T, Bi J (2015). Multi-view sparse co-clustering via proximal alternating linearized minimization. In international conference on machine learning (PMLR), Lille, France, pp. 757–766
Tzortzis G, Likas A (2012) Kernel-based weighted multi-view clustering. In 12th IEEE international conference on data mining (ICDM), Brussels, Belgium, pp. 675–684
Xiang S, Yuan L, Fan W, Wang Y, Thompson PM, Ye J (2013) Multi-source learning with block-wise missing data for alzheimer's disease prediction. In Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, Chicago, USA, pp. 185–193
Zhao X, Evans N, Dugelay JL (2014) A subspace co-training framework for multi-view clustering. Pattern Recogn Lett 41:73–82
Chen X, Xu X, Huang JZ, Ye Y (2011) TW-k-means: automated two-level variable weighting clustering algorithm for multiview data. IEEE Trans Knowl Data Eng 25(4):932–944
Yang MS, Sinaga KP (2019) A feature-reduction multi-view k-means clustering algorithm. IEEE Access 7:114472–114486
Cai X, Nie F, Huang H. (2013). Multi-view k-means clustering on big data. In 23rd international joint conference on artificial intelligence (IJCAI), Beijing, China
Lin KY, Wang CD, Meng YQ, Zhao ZL (2017). Multi-view unit intact space learning. In international conference on knowledge science, engineering and management, Changchun, China, pp. 211–223
Zhang GY, Wang CD, Huang D, Zheng WS (2017) Multi-view collaborative locally adaptive clustering with Minkowski metric. Expert Syst Appl 86:307–320
Sublime J, Matei B, Cabanes G, Grozavu N, Bennani Y, Cornuéjols A (2017) Entropy based probabilistic collaborative clustering. Pattern Recogn 72:144–157
Kumar A, Rai P, Daume H (2011) Co-regularized multi-view spectral clustering. Advances in neural information processing systems (NIPS), Grenada, Spain, pp. 1413-1421
Kang Z, Shi G, Huang S, Chen W, Pu X, Zhou JT, Xu Z (2020) Multi-graph fusion for multi-view spectral clustering. Knowl-Based Syst 189:105102
Huang D, Wang CD, Lai JH (2017) Locally weighted ensemble clustering. IEEE transactions on cybernetics 48(5):1460–1473
Zhang GY, Wang CD, Huang D, Zheng WS, Zhou YR (2018) TW-co-k-means: two-level weighted collaborative k-means for multi-view clustering. Knowl-Based Syst 150:127–138
Zhang X, Sun H, Liu Z, Ren Z, Cui Q, Li Y (2019) Robust low-rank kernel multi-view subspace clustering based on the schatten p-norm and correntropy. Inf Sci 477:430–447
Hussain SF, Bisson G (2010) Text categorization using word similarities based on higher order co-occurrences. In proceedings of the SIAM international conference on data mining (SDM), Columbus, USA, pp. 1–12
Hussain SF, Bisson G, Grimal C (2010). An improved co-similarity measure for document clustering. In 9th international conference on machine learning and applications, Tampa, USA, pp. 190–197
Adinugroho S, Wihandika RC, Adikara PP (2020) Newsgroup topic extraction using term-cluster weighting and pillar K-means clustering. International journal of computers and applications, 1-8
Sun Y, Platoš J (2020). High-Dimensional Text Clustering by Dimensionality Reduction and Improved Density Peak. Wireless Communications and Mobile Computing, vol. 2020, https://doi.org/10.1155/2020/8881112
Hancer E, Xue B, Zhang M (2020) A survey on feature selection approaches for clustering. Artif Intell Rev 53(6):4519–4545
Arthur D, Vassilvitskii S (2007). k-means++: the advantages of careful seeding. Proceedings of the 18th annual ACM-SIAM symposium on discrete algorithms, pp. 1027-1035
Wang Y, Wu L, Lin X, Gao J (2018) Multiview spectral clustering via structured low-rank matrix factorization. IEEE transactions on neural networks and learning systems 29(10):4833–4843
Liang Y, Huang D, Wang CD (2019). Consistency meets inconsistency: a unified graph learning framework for multi-view clustering. In IEEE international conference on data mining (ICDM), Beijing, China, pp. 1204–1209
Brbić M, Kopriva I (2018) Multi-view low-rank sparse subspace clustering. Pattern Recogn 73:247–258
Houthuys L, Langone R, Suykens JA (2018) Multi-view kernel spectral clustering. Information Fusion 44:46–56
Acknowledgements
Khadija Khan would like to thank the Ghulam Ishaq Khan Institute of Engineering Sciences and Technology, Topi, Pakistan for providing her a fully funded scholarship to pursue the MS degree under its GA-1 scheme.
Code availability
Shall be released in future.
Funding
This work is part of a graduate thesis (Ms. Khadija Khan) funded by the Ghulam Ishaq Khan Institute under its scholarship (GA-1) scheme.
Author information
Authors and Affiliations
Contributions
Syed Fawad Hussain is credited with the conceptualization of the idea, editing, and writing portions (method description and analysis of results) of the text; Khadija Khan wrote the code including those of some of the competing methods and running the simulations; Rashad Jilani helped with writing a substantial part of the manuscript and generating the graphs.
Corresponding author
Ethics declarations
Conflicts of interest/competing interests
None.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hussain, S.F., Khan, K. & Jillani, R. Weighted multi-view co-clustering (WMVCC) for sparse data. Appl Intell 52, 398–416 (2022). https://doi.org/10.1007/s10489-021-02405-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02405-3