Skip to main content
Log in

Multi-view subspace text clustering

  • Research
  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Text clustering has become an important challenge in artificial intelligence since several applications require to automatically organize documents into homogeneous topics. Given the availability of several text representation models, text documents can be organized through a multi-view text clustering approach. In this context, we propose a new subspace multi-view text clustering method (MVSTC). The proposed method offers a rich representation of text by integrating several models to detect different aspects of text such as syntactic, topic, and semantic features. MVSTC is capable of discovering latent correlations between documents by projecting the data onto a topological map. MVSTC seeks a subspace representation based on a low-rank and sparse representation to capture the global and local structure of multi-view textual data. Extensive experiments on real text data sets demonstrate that our method outperforms the existing multi-view clustering methods in terms of several evaluation metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Algorithm 2
Fig. 3
Fig. 4
Algorithm 3
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

The Reuters data set is available at http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html the BBC Sport dataset is available at http://mlg.ucd.ie/datasets/bbc.html the 20 Newsgroup dataset is available at https://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.html The webKB dataset is available at https://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/

Notes

  1. http://www.daviddlewis.com/resources/testcollections/reuters21578/

  2. http://qwone.com/~jason/20Newsgroups/

  3. https://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/

  4. http://mlg.ucd.ie/datasets/bbc.html

References

  • Benesty, J., Chen, J. & Huang, Y., et al (2009). Pearson correlation coefficient. In Noise reduction in speech processing (pp. 1–4). Springer.

  • Benton, A., Arora, R., & Dredze, M. (2016) Learning multiview embeddings of twitter users. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Vol. 2. pp. 14–19 Short Papers).

  • Bettoumi, S., Jlassi, C., & Arous, N. (2019). Collaborative multi-view k-means clustering. Soft Computing, 23(3), 937–945.

    Google Scholar 

  • Bickel, S., & Scheffer, T. (2004) Multi-view clustering. In: Proceedings of the IEEE International Conference on Data Mining ICDM (pp. 19–26).

  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993–1022.

    MATH  Google Scholar 

  • Boyd, S., Parikh, N., Chu, E., et al. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine learning, 3(1), 1–122.

    Article  MATH  Google Scholar 

  • Brbić, M., & Kopriva, I. (2018). Multi-view low-rank sparse subspace clustering. Pattern Recognition, 73, 247–258.

    Article  MATH  Google Scholar 

  • Cachopo, A., et al. (2007). Improving methods for single-label text categorization. Portugal: Instituto Superior Técnico.

    MATH  Google Scholar 

  • Cai, J. F., Candès, E. J., & Shen, Z. (2010). A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4), 1956–1982.

    Article  MathSciNet  MATH  Google Scholar 

  • Cao, X., Zhang, C., & Fu, H., et al (2015). Diversity-induced multi-view subspace clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 586–594).

  • Ding, Z., & Fu, Y. (2014). Low-rank common subspace for multi-view learning. In: 2014 IEEE international conference on Data Mining (pp. 110–119). IEEE.

  • El-Kassas, W. S., Salama, C. R., Rafea, A. A., et al. (2021). Automatic text summarization: A comprehensive survey. Expert Systems with Applications, 165, 113679.

    Article  Google Scholar 

  • Fraj, M., Hajkacem, M. A. B., & Essoussi, N. (2020a). On the use of ensemble method for multi view textual data. Journal of Information and Telecommunication, 4(4), 461–481.

    Article  MATH  Google Scholar 

  • Fraj, M., Hajkacem, M.A.B., & Essoussi N (2020b). Self-organizing map for multi-view text clustering. In: International Conference on Big Data Analytics and Knowledge Discovery (pp. 396–408). Springer.

  • Gao, H., Nie, F., & Li, X., et al (2015). Multi-view subspace clustering. In: Proceedings of the IEEE international conference on computer vision (pp. 4238–4246)

  • Hussain, S. F., Mushtaq, M., & Halim, Z. (2014). Multi-view document clustering via ensemble method. Journal of Intelligent Information Systems, 43(1), 81–99.

    Article  MATH  Google Scholar 

  • Jalal, A. A., & Ali, B. H. (2021). Text documents clustering using data mining techniques. International Journal of Electrical & Computer Engineering (2088-8708),11(1).

  • Kim, S., Park, H., & Lee, J. (2020). Word2vec-based latent semantic analysis (w2v-lsa) for topic modeling: A study on blockchain technology trend analysis. Expert Systems with Applications, 152, 113401.

    Article  MATH  Google Scholar 

  • Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78(9), 1464–1480.

    Article  MATH  Google Scholar 

  • Kumar, A., & Daumé, H. (2011). A co-training approach for multi-view spectral clustering. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11) (pp. 393–400).

  • Kutbay, U., et al (2018). Partitional clustering. In: Recent Applications in Data Clustering. IntechOpen

  • Lan, W., Yang, T., Chen, Q., et al. (2024). Multiview subspace clustering via low-rank symmetric affinity graph. IEEE Transactions on Neural Networks and Learning Systems, 35(8), 11382–11395.

    Article  MATH  Google Scholar 

  • Larsen, B., & Aone, C. (1999). Fast and effective text mining using linear-time document clustering. In: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 16–22). ACM

  • Lin, K.Y., Huang, L., & Wang, C.D., et al (2018). Multi-view proximity learning for clustering. In: International Conference on Database Systems for Advanced Applications (pp. 407–423). Springer

  • Liu, J., Wang, C., & Gao, J., et al (2013). Multi-view clustering via joint nonnegative matrix factorization. In: Proceedings of the 2013 SIAM International Conference on Data Mining (pp. 252–260). SIAM.

  • Liu, B. Y., Huang, L., Wang, C. D., et al. (2020). Multi-view consensus proximity learning for clustering. IEEE Transactions on Knowledge and Data Engineering, 34(7), 3405–3417.

    MATH  Google Scholar 

  • Mikolov, T., Sutskever, I., & Chen, K., et al (2013). Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems (pp. 3111–3119).

  • Nascimento, M. C., & De Carvalho, A. C. (2011). Spectral methods for graph clustering-a survey. European Journal of Operational Research, 211(2), 221–231.

    Article  MathSciNet  MATH  Google Scholar 

  • Nie, F., Cai, G., & Li, X. (2017). Multi-view clustering and semi-supervised classification with adaptive neighbours. In: Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 31(1), pp. 2408–2414)

  • Pennington, J., Socher, R., & Manning, C.D. (2014). Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543).

  • Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information processing & management, 24(5), 513–523.

    Article  MATH  Google Scholar 

  • Shang, J., Zhang, X., & Liu, L., et al (2020). Nettaxo: Automated topic taxonomy construction from text-rich network. In: Proceedings of The Web Conference 2020 (pp. 1908–1919)

  • Shi, L., Cao, L., Wang, J., et al. (2024). Enhanced latent multi-view subspace clustering. IEEE Transactions on Circuits and Systems for Video Technology.

  • Sun, S., Mao, L., Dong, Z., et al. (2019). Multiview machine learning. Springer.

    Book  MATH  Google Scholar 

  • Tao, Z., Liu, H., & Li, S., et al (2017). From ensemble clustering to multi-view clustering. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI).

  • Vesanto, J. (2000). Neural network tool for data mining: Som toolbox. In: Proceedings of symposium on tool environments and development methods for intelligent systems (TOOLMET2000) (pp. 184–196). Citeseer.

  • Vidal, R. (2011). Subspace clustering. IEEE Signal Processing Magazine, 28(2), 52–68.

    Article  MATH  Google Scholar 

  • Wahid, A., Gao, X., & Andreae, P. (2015). Multi-objective multi-view clustering ensemble based on evolutionary approach. In: 2015 IEEE Congress on Evolutionary Computation (CEC) (pp. 1696–1703). IEEE.

  • Wen, Y., Wang, S., Liao, Q., et al. (2023). Unpaired multi-view graph clustering with cross-view structure matching. IEEE Transactions on Neural Networks and Learning Systems.

  • Xu, C., Guan, Z., & Zhao, W., et al (2019). Adversarial incomplete multi-view clustering. In: IJCAI (pp. 3933–3939).

  • Xu, C., Si, J., & Guan, Z., et al (2024). Reliable conflictive multi-view learning. In: Proceedings of the AAAI Conference on Artificial Intelligence (pp. 16129–16137)

  • Xu, C., Zhao, W., Zhao, J., et al. (2022). Uncertainty-aware multiview deep learning for internet of things applications. IEEE Transactions on Industrial Informatics, 19(2), 1456–1466.

    Article  MATH  Google Scholar 

  • Yang, S., Huang, G., & Cai, B. (2019). Discovering topic representative terms for short text clustering. IEEE Access, 7, 92037–92047.

    Article  Google Scholar 

  • Yin, Q., Wu, S., He, R., et al. (2015). Multi-view clustering via pairwise sparse subspace representation. Neurocomputing, 156, 12–21.

    Article  MATH  Google Scholar 

  • Zhang, C., Hu, Q., & Fu, H., et al (2017). Latent multi-view subspace clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4279–4287).

  • Zhang, X., Zong, L., & Liu, X., et al (2015). Constrained nmf-based multi-view clustering on unmapped data. In: Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 29(1), pp. 3174–3180).

  • Zhang, C., Fu, H., Hu, Q., et al. (2018). Generalized latent multi-view subspace clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(1), 86–99.

    Article  MATH  Google Scholar 

  • Zhuang, F., Karypis, G., Ning, X., et al. (2012). Multi-view learning via probabilistic latent semantic analysis. Information Sciences, 199, 20–30.

    Article  MATH  Google Scholar 

Download references

Acknowledgements

Not applicable

Funding

Not applicable

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study’s conception and design. In detail: M.Fraj mainly contributed to conceptualization, methodology, coding/experimentation, and manuscript writing. M.A.B. Hajkacem and N. Essoussi were involved in conceptualization, methodology, and supervision. All authors agreed on the results and contributed to the final manuscript.

Corresponding author

Correspondence to Maha Fraj.

Ethics declarations

Competing Interests

The authors declare no competing interests.

Ethical Approval

Not applicable

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fraj, M., HajKacem, M.A.B. & Essoussi, N. Multi-view subspace text clustering. J Intell Inf Syst 62, 1583–1606 (2024). https://doi.org/10.1007/s10844-024-00897-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-024-00897-2

Keywords