Skip to main content

Advertisement

Log in

Weighted multi-view co-clustering (WMVCC) for sparse data

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Multi-view clustering has gained importance in recent times due to the large-scale generation of data, often from multiple sources. Multi-view clustering refers to clustering a set of objects which are expressed by multiple set of features, known as views, such as movies being expressed by the list of actors or by a textual summary of its plot. Co-clustering, on the other hand, refers to the simultaneous grouping of data samples and features under the assumption that samples exhibit a pattern only under a subset of features. This paper combines multi-view clustering with co-clustering and proposes a new Weighted Multi-View Co-Clustering (WMVCC) algorithm. The motivation behind the approach is to use the diversity of features provided by multiple sources of information while exploiting the power of co-clustering. The proposed method expands the clustering objective function to a unified co-clustering objective function across all the multiple views. The algorithm follows the k-means strategy and iteratively optimizes the clustering by updating cluster labels, features, and view weights. A local search is also employed to optimize the clustering result using weighted multi-step paths in a graph. Experiments are conducted on several benchmark datasets. The results show that the proposed approach converges quickly, and the clustering performance significantly outperforms other recent and state-of-the-art algorithms on sparse datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Data availability

Publicly available data (references mentioned in text).

Notes

  1. Links to datasets used are available here: https://sites.google.com/site/fawadsyed/datasets

References

  1. Garcia-Dias R, Vieira S, Pinaya WHL, Mechelli A (2020) Clustering analysis. In machine learning (pp. 227-247). Academic press

  2. Bisson G, Hussain F (2008) Chi-Sim: a new similarity measure for the co-clustering task. In 7th IEEE international conference on machine learning and applications (ICMLA), San Diego, USA. pp. 211–217

  3. Jiang L, Cheng Y, Yang L, Li J, Yan H, Wang X (2019) A trust-based collaborative filtering algorithm for E-commerce recommendation system. J Ambient Intell Humaniz Comput 10(8):3023–3034

    Article  Google Scholar 

  4. Ahmadian S, Joorabloo N, Jalili M, Ren Y, Meghdadi M, Afsharchi M (2020) A social recommender system based on reliable implicit relationships. Knowl-Based Syst 192:105371

    Article  Google Scholar 

  5. Zhang X, Yang Y, Li T, Zhang Y, Wang H, Fujita H (2021) CMC: a consensus multi-view clustering model for predicting Alzheimer’s disease progression. Comput Methods Prog Biomed 199:105895

    Article  Google Scholar 

  6. Xu YM, Wang CD, Lai JH (2016) Weighted multi-view clustering with feature selection. Pattern Recogn 53:25–35

    Article  Google Scholar 

  7. Yang Y, Wang H (2018) Multi-view clustering: a survey. Big Data Mining and Analytics 1(2):83–107

    Article  Google Scholar 

  8. Cai J, Luo J, Wang S, Yang S (2018) Feature selection in machine learning: a new perspective. Neurocomputing 300:70–79

    Article  Google Scholar 

  9. Hussain SF, Babar HZUD, Khalil A, Jillani RM, Hanif M, Khurshid K (2020) A fast non-redundant feature selection technique for text data. IEEE Access 8:181763–181781

    Article  Google Scholar 

  10. Xiao Q, Dai J, Luo J, Fujita H (2019) Multi-view manifold regularized learning-based method for prioritizing candidate disease miRNAs. Knowl-Based Syst 175:118–129

    Article  Google Scholar 

  11. Hussain SF, Mushtaq M, Halim Z (2014) Multi-view document clustering via ensemble method. J Intell Inf Syst 43(1):81–99

    Article  Google Scholar 

  12. Hussain SF, Bashir S (2016) Co-clustering of multi-view datasets. Knowl Inf Syst 47(3):545–570

    Article  Google Scholar 

  13. Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recogn Lett 31(8):651–666

    Article  Google Scholar 

  14. Forgey E (1965) Cluster analysis of multivariate data: efficiency vs. interpretability of classification. Biometrics 21(3):768–769

    Google Scholar 

  15. Hussain SF, Haris M (2019) A k-means based co-clustering (kCC) algorithm for sparse, high dimensional data. Expert Syst Appl 118:20–34

    Article  Google Scholar 

  16. Yu SS, Chu SW, Wang CM, Chan YK, Chang TC (2018) Two improved k-means algorithms. Appl Soft Comput 68:747–755

    Article  Google Scholar 

  17. Blömer J, Lammersen C, Schmidt M, Sohler C (2016) Theoretical analysis of the k-means algorithm–a survey. In algorithm engineering (pp. 81–116). Springer, Cham, Theoretical Analysis of the k-Means Algorithm – A Survey

  18. Celebi ME, Kingravi HA, Vela PA (2013) A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst Appl 40(1):200–210

    Article  Google Scholar 

  19. Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620

    Article  Google Scholar 

  20. Blum A, Mitchell T (1998). Combining labeled and unlabeled data with co-training. In Proceedings of the 11th annual conference on Computational learning theory, pp. 92–100

  21. Chao G, Sun S (2019) Semi-supervised multi-view maximum entropy discrimination with expectation Laplacian regularization. Information Fusion 45:296–306

    Article  Google Scholar 

  22. Zhang Y, Yang Y, Li T, Fujita H (2019) A multitask multiview clustering algorithm in heterogeneous situations based on LLE and LE. Knowl-Based Syst 163:776–786

    Article  Google Scholar 

  23. Sun J, Lu J, Xu T, Bi J (2015). Multi-view sparse co-clustering via proximal alternating linearized minimization. In international conference on machine learning (PMLR), Lille, France, pp. 757–766

  24. Tzortzis G, Likas A (2012) Kernel-based weighted multi-view clustering. In 12th IEEE international conference on data mining (ICDM), Brussels, Belgium, pp. 675–684

  25. Xiang S, Yuan L, Fan W, Wang Y, Thompson PM, Ye J (2013) Multi-source learning with block-wise missing data for alzheimer's disease prediction. In Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, Chicago, USA, pp. 185–193

  26. Zhao X, Evans N, Dugelay JL (2014) A subspace co-training framework for multi-view clustering. Pattern Recogn Lett 41:73–82

    Article  Google Scholar 

  27. Chen X, Xu X, Huang JZ, Ye Y (2011) TW-k-means: automated two-level variable weighting clustering algorithm for multiview data. IEEE Trans Knowl Data Eng 25(4):932–944

    Article  Google Scholar 

  28. Yang MS, Sinaga KP (2019) A feature-reduction multi-view k-means clustering algorithm. IEEE Access 7:114472–114486

    Article  Google Scholar 

  29. Cai X, Nie F, Huang H. (2013). Multi-view k-means clustering on big data. In 23rd international joint conference on artificial intelligence (IJCAI), Beijing, China

  30. Lin KY, Wang CD, Meng YQ, Zhao ZL (2017). Multi-view unit intact space learning. In international conference on knowledge science, engineering and management, Changchun, China, pp. 211–223

  31. Zhang GY, Wang CD, Huang D, Zheng WS (2017) Multi-view collaborative locally adaptive clustering with Minkowski metric. Expert Syst Appl 86:307–320

    Article  Google Scholar 

  32. Sublime J, Matei B, Cabanes G, Grozavu N, Bennani Y, Cornuéjols A (2017) Entropy based probabilistic collaborative clustering. Pattern Recogn 72:144–157

    Article  Google Scholar 

  33. Kumar A, Rai P, Daume H (2011) Co-regularized multi-view spectral clustering. Advances in neural information processing systems (NIPS), Grenada, Spain, pp. 1413-1421

  34. Kang Z, Shi G, Huang S, Chen W, Pu X, Zhou JT, Xu Z (2020) Multi-graph fusion for multi-view spectral clustering. Knowl-Based Syst 189:105102

    Article  Google Scholar 

  35. Huang D, Wang CD, Lai JH (2017) Locally weighted ensemble clustering. IEEE transactions on cybernetics 48(5):1460–1473

    Article  Google Scholar 

  36. Zhang GY, Wang CD, Huang D, Zheng WS, Zhou YR (2018) TW-co-k-means: two-level weighted collaborative k-means for multi-view clustering. Knowl-Based Syst 150:127–138

    Article  Google Scholar 

  37. Zhang X, Sun H, Liu Z, Ren Z, Cui Q, Li Y (2019) Robust low-rank kernel multi-view subspace clustering based on the schatten p-norm and correntropy. Inf Sci 477:430–447

    Article  Google Scholar 

  38. Hussain SF, Bisson G (2010) Text categorization using word similarities based on higher order co-occurrences. In proceedings of the SIAM international conference on data mining (SDM), Columbus, USA, pp. 1–12

  39. Hussain SF, Bisson G, Grimal C (2010). An improved co-similarity measure for document clustering. In 9th international conference on machine learning and applications, Tampa, USA, pp. 190–197

  40. Adinugroho S, Wihandika RC, Adikara PP (2020) Newsgroup topic extraction using term-cluster weighting and pillar K-means clustering. International journal of computers and applications, 1-8

  41. Sun Y, Platoš J (2020). High-Dimensional Text Clustering by Dimensionality Reduction and Improved Density Peak. Wireless Communications and Mobile Computing, vol. 2020, https://doi.org/10.1155/2020/8881112

  42. Hancer E, Xue B, Zhang M (2020) A survey on feature selection approaches for clustering. Artif Intell Rev 53(6):4519–4545

    Article  Google Scholar 

  43. Arthur D, Vassilvitskii S (2007). k-means++: the advantages of careful seeding. Proceedings of the 18th annual ACM-SIAM symposium on discrete algorithms, pp. 1027-1035

  44. Wang Y, Wu L, Lin X, Gao J (2018) Multiview spectral clustering via structured low-rank matrix factorization. IEEE transactions on neural networks and learning systems 29(10):4833–4843

    Article  Google Scholar 

  45. Liang Y, Huang D, Wang CD (2019). Consistency meets inconsistency: a unified graph learning framework for multi-view clustering. In IEEE international conference on data mining (ICDM), Beijing, China, pp. 1204–1209

  46. Brbić M, Kopriva I (2018) Multi-view low-rank sparse subspace clustering. Pattern Recogn 73:247–258

    Article  Google Scholar 

  47. Houthuys L, Langone R, Suykens JA (2018) Multi-view kernel spectral clustering. Information Fusion 44:46–56

    Article  Google Scholar 

Download references

Acknowledgements

Khadija Khan would like to thank the Ghulam Ishaq Khan Institute of Engineering Sciences and Technology, Topi, Pakistan for providing her a fully funded scholarship to pursue the MS degree under its GA-1 scheme.

Code availability

Shall be released in future.

Funding

This work is part of a graduate thesis (Ms. Khadija Khan) funded by the Ghulam Ishaq Khan Institute under its scholarship (GA-1) scheme.

Author information

Authors and Affiliations

Authors

Contributions

Syed Fawad Hussain is credited with the conceptualization of the idea, editing, and writing portions (method description and analysis of results) of the text; Khadija Khan wrote the code including those of some of the competing methods and running the simulations; Rashad Jilani helped with writing a substantial part of the manuscript and generating the graphs.

Corresponding author

Correspondence to Syed Fawad Hussain.

Ethics declarations

Conflicts of interest/competing interests

None.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hussain, S.F., Khan, K. & Jillani, R. Weighted multi-view co-clustering (WMVCC) for sparse data. Appl Intell 52, 398–416 (2022). https://doi.org/10.1007/s10489-021-02405-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02405-3

Keywords