Abstract
Structural deep document clustering methods, which leverage both structural information and inherent data properties to learn document representations using deep neural networks for clustering, have recently garnered increased research interest. However, the structural information used in these methods is usually static and remains unchanged during the clustering process. This can negatively impact the clustering results if the initial structural information is inaccurate or noisy. In this paper, we present an adaptive structural enhanced representation learning network for document clustering. This network can adjust the structural information with the help of clustering partitions and consists of two components: an adaptive structure learner, which automatically evaluates and adjusts structural information at both the document and term levels to facilitate the learning of more effective structural information, and a structural enhanced representation learning network. The latter incorporates integrates this adjusted structural information to enhance text document representations while reducing noise, thereby improving the clustering results. The iterative process between clustering results and the adaptive structural enhanced representation learning network promotes mutual optimization, progressively enhancing model performance. Extensive experiments on various text document datasets demonstrate that the proposed method outperforms several state-of-the-art methods.
Graphical abstract
The overall framework of adaptive structural enhanced representation learning network










Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
Data will be made available on request.
Notes
The experiments were conducted on an NVIDIA Tesla P40 GPU with 24 GB of memory.
References
Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: International Conference on Machine Learning. PMLR, pp 478–487
Guo X, Gao L, Liu X, Yin J (2017) Improved deep embedded clustering with local structure preservation. In: Proceedings of the twenty-sixth international joint conference on artificial intelligence. International Joint Conferences on Artificial Intelligence Organization
Ren L, Qin Y, Chen Y, Bai R, Xue J, Huang R (2023) Deep structural enhanced network for document clustering. Appl Intell 53(10):12163–12178
Bai R, Huang R, Zheng L, Chen Y, Qin Y (2022) Structure enhanced deep clustering network via a weighted neighbourhood auto-encoder. Neural Netw 155:144–154
Bo D, Wang X, Shi C, Zhu M, Lu E, Cui P (2020) Structural deep clustering network. Proc Web Conf 2020:1400–1410
Ng A et al (2011) Sparse autoencoder. CS294A Lecture Notes 72(2011):1–19
Lopez R, Boyeau P, Yosef N, Jordan M, Regier J (2020) Decision-making with auto-encoding variational bayes. Adv Neural Inf Process Syst 33:5081–5092
Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2008) The graph neural network model. IEEE Trans Neural Netw 20(1):61–80
Guo X, Liu X, Zhu E, Yin J (2017) Deep clustering with convolutional autoencoders. In: International conference on neural information processing. Springer, pp 373–382
Ahmed U, Srivastava G, Yun U, Lin JC-W (2022) Eandc: An explainable attention network based deep adaptive clustering model for mental health treatment. Futur Gener Comput Syst 130:106–113
Pitchandi P, Balakrishnan M (2023) Document clustering analysis with aid of adaptive jaro winkler with jellyfish search clustering algorithm. Adv Eng Softw 175:103322
Hazratgholizadeh R, Balafar M, Derakhshi M (2023) Active constrained deep embedded clustering with dual source. Appl Intell 53(5):5337–5367
Sadok S, Leglaive S, Girin L, Alameda-Pineda X, Séguier R (2024) A multimodal dynamical variational autoencoder for audiovisual speech representation learning. Neural Netw 172:106120
Rocha MB, Krohling RA (2024) Vae-gna: a variational autoencoder with gaussian neurons in the latent space and attention mechanisms. Knowl Inf Sys 1–23
Ji Q, Sun Y, Gao J, Hu Y, Yin B (2021) A decoder-free variational deep embedding for unsupervised clustering. IEEE Trans Neural Netw Learning Sys 33(10):5681–5693
Stauffer C, Grimson WEL (1999) Adaptive background mixture models for real-time tracking. In: Proceedings. 1999 IEEE computer society conference on computer vision and pattern recognition (Cat. No PR00149), vol 2, pp 246–252. IEEE
Miao Y, Yu L, Blunsom P (2016) Neural variational inference for text processing. In: International conference on machine learning. PMLR, pp 1727–1736
Bakhouya M, Ramchoun H, Hadda M, Masrour T (2024) Implicitly adaptive optimal proposal in variational inference for bayesian learning. Int J Data Sci Anal 1–16
Bai R, Huang R, Qin Y, Chen Y, Lin C (2023) Hvae: A deep generative model via hierarchical variational auto-encoder for multi-view document modeling. Inf Sci 623:40–55
Liu Y, Liu Z, Li S, Yu Z, Guo Y, Liu Q, Wang G (2023) Cloud-vae: Variational autoencoder with concepts embedded. Pattern Recogn 140:109530
Zhang H, Lu G, Zhan M, Zhang B (2022) Semi-supervised classification of graph convolutional networks with laplacian rank constraints. Neural Process Lett 1–12
Peng Z, Liu H, Jia Y, Hou J (2021) Attention-driven graph clustering network. In: Proceedings of the 29th ACM international conference on multimedia. pp 935–943
Schlichtkrull M, Kipf TN, Bloem P, Van Den Berg R, Titov I, Welling M (2018) Modeling relational data with graph convolutional networks. In: European semantic web conference. Springer, pp 593–607
Guo L, Dai Q (2022) Graph clustering via variational graph embedding. Pattern Recognition 122:L108334
Tsitsulin A, Palowitch J, Perozzi B, Müller E (2023) Graph clustering with graph neural networks. J Mach Learn Res 24(127):1–21
Tu W, Guan R, Zhou S, Ma C, Peng X, Cai Z, Liu Z, Cheng J, Liu X (2024) Attribute-missing graph clustering network. Proc AAAI Conf Artif Intell 38:15392–15401
Peng Z, Liu H, Jia Y, Hou J (2022) Deep attention-guided graph clustering with dual self-supervision. IEEE Trans Circ Sys Video Technol
Xu J, Li T, Zhang D, Wu J (2024) Ensemble clustering via fusing global and local structure information. Expert Syst Appl 237:121557
Müller E (2023) Graph clustering with graph neural networks. J Mach Learn Res 24:1–21
Liu Y, Yang X, Zhou S, Liu X, Wang S, Liang K, Tu W, Li L (2023) Simple contrastive graph clustering. IEEE Trans Neural Netw Learn Sys
Joachims T (1996) A probabilistic analysis of the rocchio algorithm with tfidf for text categorization. Technical report, Carnegie-mellon univ pittsburgh pa dept of computer science
Greene D, Cunningham P (2006) Practical solutions to the problem of diagonal dominance in kernel document clustering. In: Proceedings of the 23rd international conference on machine learning. pp 377–384
Lewis DD, Yang Y, Russell-Rose T, Li F (2004) Rcv1: A new benchmark collection for text categorization research. J Mach Learn Res 5(Apr):361–397
Wu F, Qiao Y, Chen J-H, Wu C, Qi T, Lian J, Liu D, Xie X, Gao J, Wu W,et al (2020) Mind: A large-scale dataset for news recommendation. In: Proceedings of the 58th annual meeting of the association for computational linguistics. pp 3597–3606
MacQueen J, et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth berkeley symposium on mathematical statistics and probability, vol 1, pp 281–297. Oakland, CA, USA
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Peterson LE (2009) K-nearest neighbor. Scholarpedia 4(2):1883
Gan G, Ma C, Wu J (2007) Data clustering: theory, algorithms, and applications. Soc Industr Appl Math 20
Acknowledgements
This work is supported by the National Natural Science Foundation of China under Grant No. 62066007, the Key Technology R&D Program of Guizhou Province No. [2023]300 and No. [2022]277.
Author information
Authors and Affiliations
Contributions
Jingjing Xue: Writing - original draft, visualization, validation, methodology, investigation, formal analysis, data curation, conceptualization. Ruizhang Huang: Writing - review & editing, supervision, resources, project administration, methodology, funding acquisition, conceptualization. Ruina Bai: Writing - review, writing - review, data verification Yanping Chen: Supervision, resources, project administration, funding acquisition. Yongbin Qin: Supervision, resources, project administration, funding acquisition. Chuan Lin: Supervision, resources, project administration.
Corresponding author
Ethics declarations
Competing of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethical and Informed Consent for Data Used
This study strictly adheres to ethical guidelines. Participants provided informed consent before data collection, ensuring privacy and voluntary participation. For inquiries, contact the principal investigator.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xue, J., Huang, R., Bai, R. et al. Adaptive structural enhanced representation learning for deep document clustering. Appl Intell 54, 12315–12331 (2024). https://doi.org/10.1007/s10489-024-05791-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-024-05791-6