Abstract
Text clustering has become an important challenge in artificial intelligence since several applications require to automatically organize documents into homogeneous topics. Given the availability of several text representation models, text documents can be organized through a multi-view text clustering approach. In this context, we propose a new subspace multi-view text clustering method (MVSTC). The proposed method offers a rich representation of text by integrating several models to detect different aspects of text such as syntactic, topic, and semantic features. MVSTC is capable of discovering latent correlations between documents by projecting the data onto a topological map. MVSTC seeks a subspace representation based on a low-rank and sparse representation to capture the global and local structure of multi-view textual data. Extensive experiments on real text data sets demonstrate that our method outperforms the existing multi-view clustering methods in terms of several evaluation metrics.











Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The Reuters data set is available at http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html the BBC Sport dataset is available at http://mlg.ucd.ie/datasets/bbc.html the 20 Newsgroup dataset is available at https://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.html The webKB dataset is available at https://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/
References
Benesty, J., Chen, J. & Huang, Y., et al (2009). Pearson correlation coefficient. In Noise reduction in speech processing (pp. 1–4). Springer.
Benton, A., Arora, R., & Dredze, M. (2016) Learning multiview embeddings of twitter users. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Vol. 2. pp. 14–19 Short Papers).
Bettoumi, S., Jlassi, C., & Arous, N. (2019). Collaborative multi-view k-means clustering. Soft Computing, 23(3), 937–945.
Bickel, S., & Scheffer, T. (2004) Multi-view clustering. In: Proceedings of the IEEE International Conference on Data Mining ICDM (pp. 19–26).
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993–1022.
Boyd, S., Parikh, N., Chu, E., et al. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine learning, 3(1), 1–122.
Brbić, M., & Kopriva, I. (2018). Multi-view low-rank sparse subspace clustering. Pattern Recognition, 73, 247–258.
Cachopo, A., et al. (2007). Improving methods for single-label text categorization. Portugal: Instituto Superior Técnico.
Cai, J. F., Candès, E. J., & Shen, Z. (2010). A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4), 1956–1982.
Cao, X., Zhang, C., & Fu, H., et al (2015). Diversity-induced multi-view subspace clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 586–594).
Ding, Z., & Fu, Y. (2014). Low-rank common subspace for multi-view learning. In: 2014 IEEE international conference on Data Mining (pp. 110–119). IEEE.
El-Kassas, W. S., Salama, C. R., Rafea, A. A., et al. (2021). Automatic text summarization: A comprehensive survey. Expert Systems with Applications, 165, 113679.
Fraj, M., Hajkacem, M. A. B., & Essoussi, N. (2020a). On the use of ensemble method for multi view textual data. Journal of Information and Telecommunication, 4(4), 461–481.
Fraj, M., Hajkacem, M.A.B., & Essoussi N (2020b). Self-organizing map for multi-view text clustering. In: International Conference on Big Data Analytics and Knowledge Discovery (pp. 396–408). Springer.
Gao, H., Nie, F., & Li, X., et al (2015). Multi-view subspace clustering. In: Proceedings of the IEEE international conference on computer vision (pp. 4238–4246)
Hussain, S. F., Mushtaq, M., & Halim, Z. (2014). Multi-view document clustering via ensemble method. Journal of Intelligent Information Systems, 43(1), 81–99.
Jalal, A. A., & Ali, B. H. (2021). Text documents clustering using data mining techniques. International Journal of Electrical & Computer Engineering (2088-8708),11(1).
Kim, S., Park, H., & Lee, J. (2020). Word2vec-based latent semantic analysis (w2v-lsa) for topic modeling: A study on blockchain technology trend analysis. Expert Systems with Applications, 152, 113401.
Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78(9), 1464–1480.
Kumar, A., & Daumé, H. (2011). A co-training approach for multi-view spectral clustering. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11) (pp. 393–400).
Kutbay, U., et al (2018). Partitional clustering. In: Recent Applications in Data Clustering. IntechOpen
Lan, W., Yang, T., Chen, Q., et al. (2024). Multiview subspace clustering via low-rank symmetric affinity graph. IEEE Transactions on Neural Networks and Learning Systems, 35(8), 11382–11395.
Larsen, B., & Aone, C. (1999). Fast and effective text mining using linear-time document clustering. In: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 16–22). ACM
Lin, K.Y., Huang, L., & Wang, C.D., et al (2018). Multi-view proximity learning for clustering. In: International Conference on Database Systems for Advanced Applications (pp. 407–423). Springer
Liu, J., Wang, C., & Gao, J., et al (2013). Multi-view clustering via joint nonnegative matrix factorization. In: Proceedings of the 2013 SIAM International Conference on Data Mining (pp. 252–260). SIAM.
Liu, B. Y., Huang, L., Wang, C. D., et al. (2020). Multi-view consensus proximity learning for clustering. IEEE Transactions on Knowledge and Data Engineering, 34(7), 3405–3417.
Mikolov, T., Sutskever, I., & Chen, K., et al (2013). Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems (pp. 3111–3119).
Nascimento, M. C., & De Carvalho, A. C. (2011). Spectral methods for graph clustering-a survey. European Journal of Operational Research, 211(2), 221–231.
Nie, F., Cai, G., & Li, X. (2017). Multi-view clustering and semi-supervised classification with adaptive neighbours. In: Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 31(1), pp. 2408–2414)
Pennington, J., Socher, R., & Manning, C.D. (2014). Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543).
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information processing & management, 24(5), 513–523.
Shang, J., Zhang, X., & Liu, L., et al (2020). Nettaxo: Automated topic taxonomy construction from text-rich network. In: Proceedings of The Web Conference 2020 (pp. 1908–1919)
Shi, L., Cao, L., Wang, J., et al. (2024). Enhanced latent multi-view subspace clustering. IEEE Transactions on Circuits and Systems for Video Technology.
Sun, S., Mao, L., Dong, Z., et al. (2019). Multiview machine learning. Springer.
Tao, Z., Liu, H., & Li, S., et al (2017). From ensemble clustering to multi-view clustering. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI).
Vesanto, J. (2000). Neural network tool for data mining: Som toolbox. In: Proceedings of symposium on tool environments and development methods for intelligent systems (TOOLMET2000) (pp. 184–196). Citeseer.
Vidal, R. (2011). Subspace clustering. IEEE Signal Processing Magazine, 28(2), 52–68.
Wahid, A., Gao, X., & Andreae, P. (2015). Multi-objective multi-view clustering ensemble based on evolutionary approach. In: 2015 IEEE Congress on Evolutionary Computation (CEC) (pp. 1696–1703). IEEE.
Wen, Y., Wang, S., Liao, Q., et al. (2023). Unpaired multi-view graph clustering with cross-view structure matching. IEEE Transactions on Neural Networks and Learning Systems.
Xu, C., Guan, Z., & Zhao, W., et al (2019). Adversarial incomplete multi-view clustering. In: IJCAI (pp. 3933–3939).
Xu, C., Si, J., & Guan, Z., et al (2024). Reliable conflictive multi-view learning. In: Proceedings of the AAAI Conference on Artificial Intelligence (pp. 16129–16137)
Xu, C., Zhao, W., Zhao, J., et al. (2022). Uncertainty-aware multiview deep learning for internet of things applications. IEEE Transactions on Industrial Informatics, 19(2), 1456–1466.
Yang, S., Huang, G., & Cai, B. (2019). Discovering topic representative terms for short text clustering. IEEE Access, 7, 92037–92047.
Yin, Q., Wu, S., He, R., et al. (2015). Multi-view clustering via pairwise sparse subspace representation. Neurocomputing, 156, 12–21.
Zhang, C., Hu, Q., & Fu, H., et al (2017). Latent multi-view subspace clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4279–4287).
Zhang, X., Zong, L., & Liu, X., et al (2015). Constrained nmf-based multi-view clustering on unmapped data. In: Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 29(1), pp. 3174–3180).
Zhang, C., Fu, H., Hu, Q., et al. (2018). Generalized latent multi-view subspace clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(1), 86–99.
Zhuang, F., Karypis, G., Ning, X., et al. (2012). Multi-view learning via probabilistic latent semantic analysis. Information Sciences, 199, 20–30.
Acknowledgements
Not applicable
Funding
Not applicable
Author information
Authors and Affiliations
Contributions
All authors contributed to the study’s conception and design. In detail: M.Fraj mainly contributed to conceptualization, methodology, coding/experimentation, and manuscript writing. M.A.B. Hajkacem and N. Essoussi were involved in conceptualization, methodology, and supervision. All authors agreed on the results and contributed to the final manuscript.
Corresponding author
Ethics declarations
Competing Interests
The authors declare no competing interests.
Ethical Approval
Not applicable
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fraj, M., HajKacem, M.A.B. & Essoussi, N. Multi-view subspace text clustering. J Intell Inf Syst 62, 1583–1606 (2024). https://doi.org/10.1007/s10844-024-00897-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-024-00897-2