Multi-view subspace text clustering

Fraj, Maha; HajKacem, Mohamed Aymen Ben; Essoussi, Nadia

doi:10.1007/s10844-024-00897-2

Multi-view subspace text clustering

Research
Published: 04 October 2024

Volume 62, pages 1583–1606, (2024)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Maha Fraj¹,
Mohamed Aymen Ben HajKacem¹ &
Nadia Essoussi¹

286 Accesses
3 Citations
Explore all metrics

Abstract

Text clustering has become an important challenge in artificial intelligence since several applications require to automatically organize documents into homogeneous topics. Given the availability of several text representation models, text documents can be organized through a multi-view text clustering approach. In this context, we propose a new subspace multi-view text clustering method (MVSTC). The proposed method offers a rich representation of text by integrating several models to detect different aspects of text such as syntactic, topic, and semantic features. MVSTC is capable of discovering latent correlations between documents by projecting the data onto a topological map. MVSTC seeks a subspace representation based on a low-rank and sparse representation to capture the global and local structure of multi-view textual data. Extensive experiments on real text data sets demonstrate that our method outperforms the existing multi-view clustering methods in terms of several evaluation metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Soft Subspace Clustering Method for Text Data Using a Probability Based Feature Weighting Scheme

Self-Organizing Map for Multi-view Text Clustering

Ensemble Method for Multi-view Text Clustering

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

The Reuters data set is available at http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html the BBC Sport dataset is available at http://mlg.ucd.ie/datasets/bbc.html the 20 Newsgroup dataset is available at https://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.html The webKB dataset is available at https://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/

Notes

References

Benesty, J., Chen, J. & Huang, Y., et al (2009). Pearson correlation coefficient. In Noise reduction in speech processing (pp. 1–4). Springer.
Benton, A., Arora, R., & Dredze, M. (2016) Learning multiview embeddings of twitter users. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Vol. 2. pp. 14–19 Short Papers).
Bettoumi, S., Jlassi, C., & Arous, N. (2019). Collaborative multi-view k-means clustering. Soft Computing, 23(3), 937–945.
Google Scholar
Bickel, S., & Scheffer, T. (2004) Multi-view clustering. In: Proceedings of the IEEE International Conference on Data Mining ICDM (pp. 19–26).
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993–1022.
MATH Google Scholar
Boyd, S., Parikh, N., Chu, E., et al. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine learning, 3(1), 1–122.
Article MATH Google Scholar
Brbić, M., & Kopriva, I. (2018). Multi-view low-rank sparse subspace clustering. Pattern Recognition, 73, 247–258.
Article MATH Google Scholar
Cachopo, A., et al. (2007). Improving methods for single-label text categorization. Portugal: Instituto Superior Técnico.
MATH Google Scholar
Cai, J. F., Candès, E. J., & Shen, Z. (2010). A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4), 1956–1982.
Article MathSciNet MATH Google Scholar
Cao, X., Zhang, C., & Fu, H., et al (2015). Diversity-induced multi-view subspace clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 586–594).
Ding, Z., & Fu, Y. (2014). Low-rank common subspace for multi-view learning. In: 2014 IEEE international conference on Data Mining (pp. 110–119). IEEE.
El-Kassas, W. S., Salama, C. R., Rafea, A. A., et al. (2021). Automatic text summarization: A comprehensive survey. Expert Systems with Applications, 165, 113679.
Article Google Scholar
Fraj, M., Hajkacem, M. A. B., & Essoussi, N. (2020a). On the use of ensemble method for multi view textual data. Journal of Information and Telecommunication, 4(4), 461–481.
Article MATH Google Scholar
Fraj, M., Hajkacem, M.A.B., & Essoussi N (2020b). Self-organizing map for multi-view text clustering. In: International Conference on Big Data Analytics and Knowledge Discovery (pp. 396–408). Springer.
Gao, H., Nie, F., & Li, X., et al (2015). Multi-view subspace clustering. In: Proceedings of the IEEE international conference on computer vision (pp. 4238–4246)
Hussain, S. F., Mushtaq, M., & Halim, Z. (2014). Multi-view document clustering via ensemble method. Journal of Intelligent Information Systems, 43(1), 81–99.
Article MATH Google Scholar
Jalal, A. A., & Ali, B. H. (2021). Text documents clustering using data mining techniques. International Journal of Electrical & Computer Engineering (2088-8708),11(1).
Kim, S., Park, H., & Lee, J. (2020). Word2vec-based latent semantic analysis (w2v-lsa) for topic modeling: A study on blockchain technology trend analysis. Expert Systems with Applications, 152, 113401.
Article MATH Google Scholar
Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78(9), 1464–1480.
Article MATH Google Scholar
Kumar, A., & Daumé, H. (2011). A co-training approach for multi-view spectral clustering. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11) (pp. 393–400).
Kutbay, U., et al (2018). Partitional clustering. In: Recent Applications in Data Clustering. IntechOpen
Lan, W., Yang, T., Chen, Q., et al. (2024). Multiview subspace clustering via low-rank symmetric affinity graph. IEEE Transactions on Neural Networks and Learning Systems, 35(8), 11382–11395.
Article MATH Google Scholar
Larsen, B., & Aone, C. (1999). Fast and effective text mining using linear-time document clustering. In: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 16–22). ACM
Lin, K.Y., Huang, L., & Wang, C.D., et al (2018). Multi-view proximity learning for clustering. In: International Conference on Database Systems for Advanced Applications (pp. 407–423). Springer
Liu, J., Wang, C., & Gao, J., et al (2013). Multi-view clustering via joint nonnegative matrix factorization. In: Proceedings of the 2013 SIAM International Conference on Data Mining (pp. 252–260). SIAM.
Liu, B. Y., Huang, L., Wang, C. D., et al. (2020). Multi-view consensus proximity learning for clustering. IEEE Transactions on Knowledge and Data Engineering, 34(7), 3405–3417.
MATH Google Scholar
Mikolov, T., Sutskever, I., & Chen, K., et al (2013). Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems (pp. 3111–3119).
Nascimento, M. C., & De Carvalho, A. C. (2011). Spectral methods for graph clustering-a survey. European Journal of Operational Research, 211(2), 221–231.
Article MathSciNet MATH Google Scholar
Nie, F., Cai, G., & Li, X. (2017). Multi-view clustering and semi-supervised classification with adaptive neighbours. In: Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 31(1), pp. 2408–2414)
Pennington, J., Socher, R., & Manning, C.D. (2014). Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543).
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information processing & management, 24(5), 513–523.
Article MATH Google Scholar
Shang, J., Zhang, X., & Liu, L., et al (2020). Nettaxo: Automated topic taxonomy construction from text-rich network. In: Proceedings of The Web Conference 2020 (pp. 1908–1919)
Shi, L., Cao, L., Wang, J., et al. (2024). Enhanced latent multi-view subspace clustering. IEEE Transactions on Circuits and Systems for Video Technology.
Sun, S., Mao, L., Dong, Z., et al. (2019). Multiview machine learning. Springer.
Book MATH Google Scholar
Tao, Z., Liu, H., & Li, S., et al (2017). From ensemble clustering to multi-view clustering. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI).
Vesanto, J. (2000). Neural network tool for data mining: Som toolbox. In: Proceedings of symposium on tool environments and development methods for intelligent systems (TOOLMET2000) (pp. 184–196). Citeseer.
Vidal, R. (2011). Subspace clustering. IEEE Signal Processing Magazine, 28(2), 52–68.
Article MATH Google Scholar
Wahid, A., Gao, X., & Andreae, P. (2015). Multi-objective multi-view clustering ensemble based on evolutionary approach. In: 2015 IEEE Congress on Evolutionary Computation (CEC) (pp. 1696–1703). IEEE.
Wen, Y., Wang, S., Liao, Q., et al. (2023). Unpaired multi-view graph clustering with cross-view structure matching. IEEE Transactions on Neural Networks and Learning Systems.
Xu, C., Guan, Z., & Zhao, W., et al (2019). Adversarial incomplete multi-view clustering. In: IJCAI (pp. 3933–3939).
Xu, C., Si, J., & Guan, Z., et al (2024). Reliable conflictive multi-view learning. In: Proceedings of the AAAI Conference on Artificial Intelligence (pp. 16129–16137)
Xu, C., Zhao, W., Zhao, J., et al. (2022). Uncertainty-aware multiview deep learning for internet of things applications. IEEE Transactions on Industrial Informatics, 19(2), 1456–1466.
Article MATH Google Scholar
Yang, S., Huang, G., & Cai, B. (2019). Discovering topic representative terms for short text clustering. IEEE Access, 7, 92037–92047.
Article Google Scholar
Yin, Q., Wu, S., He, R., et al. (2015). Multi-view clustering via pairwise sparse subspace representation. Neurocomputing, 156, 12–21.
Article MATH Google Scholar
Zhang, C., Hu, Q., & Fu, H., et al (2017). Latent multi-view subspace clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4279–4287).
Zhang, X., Zong, L., & Liu, X., et al (2015). Constrained nmf-based multi-view clustering on unmapped data. In: Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 29(1), pp. 3174–3180).
Zhang, C., Fu, H., Hu, Q., et al. (2018). Generalized latent multi-view subspace clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(1), 86–99.
Article MATH Google Scholar
Zhuang, F., Karypis, G., Ning, X., et al. (2012). Multi-view learning via probabilistic latent semantic analysis. Information Sciences, 199, 20–30.
Article MATH Google Scholar

Download references

Acknowledgements

Not applicable

Funding

Not applicable

Author information

Authors and Affiliations

Université de Tunis, Institut Supérieur de Gestion de Tunis, 41, Rue de la Liberté, Cité Bouchoucha 2000 Le Bardo, Tunis, Tunisie
Maha Fraj, Mohamed Aymen Ben HajKacem & Nadia Essoussi

Authors

Maha Fraj
View author publications
You can also search for this author inPubMed Google Scholar
Mohamed Aymen Ben HajKacem
View author publications
You can also search for this author inPubMed Google Scholar
Nadia Essoussi
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

All authors contributed to the study’s conception and design. In detail: M.Fraj mainly contributed to conceptualization, methodology, coding/experimentation, and manuscript writing. M.A.B. Hajkacem and N. Essoussi were involved in conceptualization, methodology, and supervision. All authors agreed on the results and contributed to the final manuscript.

Corresponding author

Correspondence to Maha Fraj.

Ethics declarations

Competing Interests

The authors declare no competing interests.

Ethical Approval

Not applicable

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Fraj, M., HajKacem, M.A.B. & Essoussi, N. Multi-view subspace text clustering. J Intell Inf Syst 62, 1583–1606 (2024). https://doi.org/10.1007/s10844-024-00897-2

Download citation

Received: 26 February 2024
Revised: 30 August 2024
Accepted: 27 September 2024
Published: 04 October 2024
Issue Date: December 2024
DOI: https://doi.org/10.1007/s10844-024-00897-2

Keywords

Part of a collection:

Data-Centric AI

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-view subspace text clustering

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Soft Subspace Clustering Method for Text Data Using a Probability Based Feature Weighting Scheme

Self-Organizing Map for Multi-view Text Clustering

Ensemble Method for Multi-view Text Clustering

Explore related subjects

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Ethical Approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now