Adaptive structural enhanced representation learning for deep document clustering

Xue, Jingjing; Huang, Ruizhang; Bai, Ruina; Chen, Yanping; Qin, Yongbin; Lin, Chuan

doi:10.1007/s10489-024-05791-6

Adaptive structural enhanced representation learning for deep document clustering

Published: 12 September 2024

Volume 54, pages 12315–12331, (2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Jingjing Xue ORCID: orcid.org/0000-0001-5797-9713¹,
Ruizhang Huang¹,
Ruina Bai¹,
Yanping Chen¹,
Yongbin Qin¹ &
…
Chuan Lin¹

262 Accesses
Explore all metrics

Abstract

Structural deep document clustering methods, which leverage both structural information and inherent data properties to learn document representations using deep neural networks for clustering, have recently garnered increased research interest. However, the structural information used in these methods is usually static and remains unchanged during the clustering process. This can negatively impact the clustering results if the initial structural information is inaccurate or noisy. In this paper, we present an adaptive structural enhanced representation learning network for document clustering. This network can adjust the structural information with the help of clustering partitions and consists of two components: an adaptive structure learner, which automatically evaluates and adjusts structural information at both the document and term levels to facilitate the learning of more effective structural information, and a structural enhanced representation learning network. The latter incorporates integrates this adjusted structural information to enhance text document representations while reducing noise, thereby improving the clustering results. The iterative process between clustering results and the adaptive structural enhanced representation learning network promotes mutual optimization, progressively enhancing model performance. Extensive experiments on various text document datasets demonstrate that the proposed method outperforms several state-of-the-art methods.

Graphical abstract

The overall framework of adaptive structural enhanced representation learning network

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep structural enhanced network for document clustering

Article 23 September 2022

BioBERT Based Efficient Clustering Framework for Biomedical Document Analysis

DACL+: domain-adapted contrastive learning for enhanced low-resource language representations in document clustering tasks

Article 05 December 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

Data will be made available on request.

Notes

http://mlg.ucd.ie/datasets/bbc.html
https://www.aminer.cn/data
https://msnews.github.io/
http://mlg.ucd.ie/files/datasets/bbcsport.zip
The experiments were conducted on an NVIDIA Tesla P40 GPU with 24 GB of memory.

References

Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: International Conference on Machine Learning. PMLR, pp 478–487
Guo X, Gao L, Liu X, Yin J (2017) Improved deep embedded clustering with local structure preservation. In: Proceedings of the twenty-sixth international joint conference on artificial intelligence. International Joint Conferences on Artificial Intelligence Organization
Ren L, Qin Y, Chen Y, Bai R, Xue J, Huang R (2023) Deep structural enhanced network for document clustering. Appl Intell 53(10):12163–12178
Article Google Scholar
Bai R, Huang R, Zheng L, Chen Y, Qin Y (2022) Structure enhanced deep clustering network via a weighted neighbourhood auto-encoder. Neural Netw 155:144–154
Article Google Scholar
Bo D, Wang X, Shi C, Zhu M, Lu E, Cui P (2020) Structural deep clustering network. Proc Web Conf 2020:1400–1410
Google Scholar
Ng A et al (2011) Sparse autoencoder. CS294A Lecture Notes 72(2011):1–19
Lopez R, Boyeau P, Yosef N, Jordan M, Regier J (2020) Decision-making with auto-encoding variational bayes. Adv Neural Inf Process Syst 33:5081–5092
Google Scholar
Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2008) The graph neural network model. IEEE Trans Neural Netw 20(1):61–80
Article Google Scholar
Guo X, Liu X, Zhu E, Yin J (2017) Deep clustering with convolutional autoencoders. In: International conference on neural information processing. Springer, pp 373–382
Ahmed U, Srivastava G, Yun U, Lin JC-W (2022) Eandc: An explainable attention network based deep adaptive clustering model for mental health treatment. Futur Gener Comput Syst 130:106–113
Article Google Scholar
Pitchandi P, Balakrishnan M (2023) Document clustering analysis with aid of adaptive jaro winkler with jellyfish search clustering algorithm. Adv Eng Softw 175:103322
Article Google Scholar
Hazratgholizadeh R, Balafar M, Derakhshi M (2023) Active constrained deep embedded clustering with dual source. Appl Intell 53(5):5337–5367
Google Scholar
Sadok S, Leglaive S, Girin L, Alameda-Pineda X, Séguier R (2024) A multimodal dynamical variational autoencoder for audiovisual speech representation learning. Neural Netw 172:106120
Article Google Scholar
Rocha MB, Krohling RA (2024) Vae-gna: a variational autoencoder with gaussian neurons in the latent space and attention mechanisms. Knowl Inf Sys 1–23
Ji Q, Sun Y, Gao J, Hu Y, Yin B (2021) A decoder-free variational deep embedding for unsupervised clustering. IEEE Trans Neural Netw Learning Sys 33(10):5681–5693
Article MathSciNet Google Scholar
Stauffer C, Grimson WEL (1999) Adaptive background mixture models for real-time tracking. In: Proceedings. 1999 IEEE computer society conference on computer vision and pattern recognition (Cat. No PR00149), vol 2, pp 246–252. IEEE
Miao Y, Yu L, Blunsom P (2016) Neural variational inference for text processing. In: International conference on machine learning. PMLR, pp 1727–1736
Bakhouya M, Ramchoun H, Hadda M, Masrour T (2024) Implicitly adaptive optimal proposal in variational inference for bayesian learning. Int J Data Sci Anal 1–16
Bai R, Huang R, Qin Y, Chen Y, Lin C (2023) Hvae: A deep generative model via hierarchical variational auto-encoder for multi-view document modeling. Inf Sci 623:40–55
Article Google Scholar
Liu Y, Liu Z, Li S, Yu Z, Guo Y, Liu Q, Wang G (2023) Cloud-vae: Variational autoencoder with concepts embedded. Pattern Recogn 140:109530
Article Google Scholar
Zhang H, Lu G, Zhan M, Zhang B (2022) Semi-supervised classification of graph convolutional networks with laplacian rank constraints. Neural Process Lett 1–12
Peng Z, Liu H, Jia Y, Hou J (2021) Attention-driven graph clustering network. In: Proceedings of the 29th ACM international conference on multimedia. pp 935–943
Schlichtkrull M, Kipf TN, Bloem P, Van Den Berg R, Titov I, Welling M (2018) Modeling relational data with graph convolutional networks. In: European semantic web conference. Springer, pp 593–607
Guo L, Dai Q (2022) Graph clustering via variational graph embedding. Pattern Recognition 122:L108334
Article Google Scholar
Tsitsulin A, Palowitch J, Perozzi B, Müller E (2023) Graph clustering with graph neural networks. J Mach Learn Res 24(127):1–21
MathSciNet Google Scholar
Tu W, Guan R, Zhou S, Ma C, Peng X, Cai Z, Liu Z, Cheng J, Liu X (2024) Attribute-missing graph clustering network. Proc AAAI Conf Artif Intell 38:15392–15401
Google Scholar
Peng Z, Liu H, Jia Y, Hou J (2022) Deep attention-guided graph clustering with dual self-supervision. IEEE Trans Circ Sys Video Technol
Xu J, Li T, Zhang D, Wu J (2024) Ensemble clustering via fusing global and local structure information. Expert Syst Appl 237:121557
Article Google Scholar
Müller E (2023) Graph clustering with graph neural networks. J Mach Learn Res 24:1–21
MathSciNet Google Scholar
Liu Y, Yang X, Zhou S, Liu X, Wang S, Liang K, Tu W, Li L (2023) Simple contrastive graph clustering. IEEE Trans Neural Netw Learn Sys
Joachims T (1996) A probabilistic analysis of the rocchio algorithm with tfidf for text categorization. Technical report, Carnegie-mellon univ pittsburgh pa dept of computer science
Greene D, Cunningham P (2006) Practical solutions to the problem of diagonal dominance in kernel document clustering. In: Proceedings of the 23rd international conference on machine learning. pp 377–384
Lewis DD, Yang Y, Russell-Rose T, Li F (2004) Rcv1: A new benchmark collection for text categorization research. J Mach Learn Res 5(Apr):361–397
Google Scholar
Wu F, Qiao Y, Chen J-H, Wu C, Qi T, Lian J, Liu D, Xie X, Gao J, Wu W,et al (2020) Mind: A large-scale dataset for news recommendation. In: Proceedings of the 58th annual meeting of the association for computational linguistics. pp 3597–3606
MacQueen J, et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth berkeley symposium on mathematical statistics and probability, vol 1, pp 281–297. Oakland, CA, USA
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Google Scholar
Peterson LE (2009) K-nearest neighbor. Scholarpedia 4(2):1883
Article Google Scholar
Gan G, Ma C, Wu J (2007) Data clustering: theory, algorithms, and applications. Soc Industr Appl Math 20

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grant No. 62066007, the Key Technology R&D Program of Guizhou Province No. [2023]300 and No. [2022]277.

Author information

Authors and Affiliations

Text Computing & Cognitive Intelligence Engineering Research Center of National Education Ministry, State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, 550000, Guizhou, China
Jingjing Xue, Ruizhang Huang, Ruina Bai, Yanping Chen, Yongbin Qin & Chuan Lin

Authors

Jingjing Xue
View author publications
You can also search for this author inPubMed Google Scholar
Ruizhang Huang
View author publications
You can also search for this author inPubMed Google Scholar
Ruina Bai
View author publications
You can also search for this author inPubMed Google Scholar
Yanping Chen
View author publications
You can also search for this author inPubMed Google Scholar
Yongbin Qin
View author publications
You can also search for this author inPubMed Google Scholar
Chuan Lin
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Jingjing Xue: Writing - original draft, visualization, validation, methodology, investigation, formal analysis, data curation, conceptualization. Ruizhang Huang: Writing - review & editing, supervision, resources, project administration, methodology, funding acquisition, conceptualization. Ruina Bai: Writing - review, writing - review, data verification Yanping Chen: Supervision, resources, project administration, funding acquisition. Yongbin Qin: Supervision, resources, project administration, funding acquisition. Chuan Lin: Supervision, resources, project administration.

Corresponding author

Correspondence to Ruizhang Huang.

Ethics declarations

Competing of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical and Informed Consent for Data Used

This study strictly adheres to ethical guidelines. Participants provided informed consent before data collection, ensuring privacy and voluntary participation. For inquiries, contact the principal investigator.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xue, J., Huang, R., Bai, R. et al. Adaptive structural enhanced representation learning for deep document clustering. Appl Intell 54, 12315–12331 (2024). https://doi.org/10.1007/s10489-024-05791-6

Download citation

Accepted: 18 August 2024
Published: 12 September 2024
Issue Date: December 2024
DOI: https://doi.org/10.1007/s10489-024-05791-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive structural enhanced representation learning for deep document clustering

Abstract

Graphical abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Deep structural enhanced network for document clustering

BioBERT Based Efficient Clustering Framework for Biomedical Document Analysis

DACL+: domain-adapted contrastive learning for enhanced low-resource language representations in document clustering tasks

Explore related subjects

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing of interest

Ethical and Informed Consent for Data Used

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now