Incremental document clustering using fuzzy-based optimization strategy

Yarlagadda, Madhulika; Kancherla, Gangadhara Rao; Atluri, Srikrishna

doi:10.1007/s12065-019-00335-1

Incremental document clustering using fuzzy-based optimization strategy

Research Paper
Published: 17 December 2019

Volume 13, pages 497–510, (2020)
Cite this article

Evolutionary Intelligence Aims and scope Submit manuscript

Madhulika Yarlagadda^1,2,
Gangadhara Rao Kancherla³ &
Srikrishna Atluri²

247 Accesses
2 Citations
Explore all metrics

Abstract

The technical advances in the information systems contribute towards the massive availability of the documents stored in the electronic database, such as e-mails, internet and web pages. Thus, it becomes a complex task for arranging and browsing the required document. This paper proposes an incremental document clustering method for performing effective document clustering. The proposed model undergoes three steps for document clustering, namely pre-processing, feature extraction and Incremental document categorization. The pre-processing step is carried out for removing the artifacts and redundant data from the documents by undergoing stop word removal process and stemming process. Then, the next step is the feature extraction based on Term Frequency-Inverse Document Frequency (TF–IDF) and Wordnet features. Here, the feature is selected using support measure named ModSupport, and then, the incremental document clustering is performed based on the hybrid fuzzy bounding degree and Rider-Moth Flame optimization algorithm (RMFO) using the boundary degree. Here, the RMFO aims at the selection of the optimal weights for the boundary degree model and is designed by integrating Rider Optimization Algorithm (ROA) with Moth Flame optimization (MFO). The performance of the proposed RMFO outperformed the existing techniques using accuracy, F-measure, precision, and recall with maximal values 93.98%, 94.876%, 93.958% and 93.964% respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluating Hard and Soft Flat-Clustering Algorithms for Text Documents

A Novel Weighting Scheme Applied to Improve the Text Document Clustering Techniques

Classification of Text Documents Using Adaptive Fuzzy C-Means Clustering

References

Chevalier M, El Malki M, Kopliku A, Teste O, Tournier R (2016) Implementation of multidimensional databases with document-oriented NoSQL. In: Big data analytics and knowledge discovery, pp 379–390
Martinho B, Santos MY (2016) An architecture for data warehousing in big data environments. In: Research and practical issues of enterprise information systems, vol 268, pp 237–250
Doermann D (1998) The indexing and retrieval of document images: a survey. Comput Vis Image Underst 70(3):287–298
Google Scholar
Callan JP (1994) Passage-level evidence in document retrieval. In: SIGIR. Springer, Berlin, pp 302–310
Hao S, Shi C, Niu Z, Cao L (2018) Concept coupling learning for improving concept lattice-based document retrieval. Eng Appl Artif Intell 69:65–75
Google Scholar
Mothe J, Chrisment C, Dousset B, Alaux J (2003) DocCube: multi-dimensional visualisation and exploration of large document sets. J Am Soc Inf Sci Technol 54(7):650–659
Google Scholar
Slonim N, Tishby N (2000) Document clustering using word clusters via the information bottleneck method. In: Proceedings of the 23rd annual international conference on research and development in information retrieval, pp 208–215
Karypis MSG, Kumar V, Steinbach M (2000) A comparison of document clustering techniques. In: Proceedings of TextMining workshop at KDD2000, May 2000
Li N, Luo W, Yang K, Zhuang F, He Q, Shi Z (2018) Self-organizing weighted incremental probabilistic latent semantic analysis. Int J Mach Learn Cybern 9(12):1987–1998
Google Scholar
Wan Y, Liu X, Wu Y, Guo L, Chen Q, Wang M (2018) ICGT: a novel incremental clustering approach based on GMM tree. Data Knowl Eng 117:71–86
Google Scholar
Sangaiah AK, Fakhry AE, Abdel-Basset M, El-Henawy I (2018) Arabic text clustering using improved clustering algorithms with dimensionality reduction. Cluster Comput 22:1–15
Google Scholar
Kotte VK, Rajavelu S, Rajsingh EB (2019) A similarity function for feature pattern clustering and high dimensional text document classification. Found Sci. https://doi.org/10.1007/s10699-019-09592-w
Article Google Scholar
Mulay P, Shinde K (2019) Personalized diabetes analysis using correlation-based incremental clustering algorithm. In: Mittal M, Balas VE, Goyal LM, Kumar R (eds) Big data processing using spark in cloud. Springer, Berlin, pp 167–193
Google Scholar
Madhusudhanan S, Jaganathan S (2018) Incremental learning for classification of unstructured data using extreme learning machine. Algorithms 11(10):158
MATH Google Scholar
Kannan J, Shanavas AM, Swaminathan S (2018) SportsBuzzer: detecting events at real time in Twitter using incremental clustering. Trans Mach Learn Artif Intell 6(1):01
Google Scholar
Liu Y, Chen J, Wu S, Liu Z, Chao H (2018) Incremental fuzzy C medoids clustering of time series data using dynamic time warping distance. PLoS ONE 13(5):0197499
Google Scholar
Binu D, Kariyappa BS (2018) RideNN: a new rider optimization algorithm-based neural network for fault diagnosis in analog circuits. IEEE Trans Instrum Meas 68:2–26
Google Scholar
Mirjalili S (2015) Moth–flame optimization algorithm: a novel nature-inspired heuristic paradigm. Knowl Based Syst 89:228–249
Google Scholar
Sedding J, Kazakov D (2004) WordNet-based text document clustering. In: Proceedings of the 3rd workshop on robust methods in analysis of natural language data, pp 104–113
Yarlagadda M, Gangadhara Roa K, Srikrishna A (2019) Frequent itemset-based feature selection and Rider Moth Search Algorithm for document clustering. J King Saud Univ Comput Inf Sci. https://doi.org/10.1016/j.jksuci.2019.09.002
Article Google Scholar
Xu Z, Xia M (2011) Distance and similarity measures for hesitant fuzzy sets. Inf Sci 181(11):2128–2138
MathSciNet MATH Google Scholar
Newsgroup database. http://qwone.com/~jason/20Newsgroups/. Accessed Oct 2018
Reuter Database. https://archive.ics.uci.edu/ml/datasets/reuters-21578+text+categorization+collection. Accessed Oct 2018

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, JNTUK, Kakinada, Andhra Pradesh, India
Madhulika Yarlagadda
Department of Information Technology, RVR&JC College of Engineering, Chowdavaram, Guntur, Andhra Pradesh, India
Madhulika Yarlagadda & Srikrishna Atluri
Department of Computer Science and Engineering, Acharya Nagarjuna University, Guntur, Andhra Pradesh, India
Gangadhara Rao Kancherla

Authors

Madhulika Yarlagadda
View author publications
You can also search for this author in PubMed Google Scholar
Gangadhara Rao Kancherla
View author publications
You can also search for this author in PubMed Google Scholar
Srikrishna Atluri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Madhulika Yarlagadda.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yarlagadda, M., Kancherla, G.R. & Atluri, S. Incremental document clustering using fuzzy-based optimization strategy. Evol. Intel. 13, 497–510 (2020). https://doi.org/10.1007/s12065-019-00335-1

Download citation

Received: 29 May 2019
Revised: 15 November 2019
Accepted: 11 December 2019
Published: 17 December 2019
Issue Date: September 2020
DOI: https://doi.org/10.1007/s12065-019-00335-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Incremental document clustering using fuzzy-based optimization strategy

Abstract

Access this article

Similar content being viewed by others

Evaluating Hard and Soft Flat-Clustering Algorithms for Text Documents

A Novel Weighting Scheme Applied to Improve the Text Document Clustering Techniques

Classification of Text Documents Using Adaptive Fuzzy C-Means Clustering

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Incremental document clustering using fuzzy-based optimization strategy

Abstract

Access this article

Similar content being viewed by others

Evaluating Hard and Soft Flat-Clustering Algorithms for Text Documents

A Novel Weighting Scheme Applied to Improve the Text Document Clustering Techniques

Classification of Text Documents Using Adaptive Fuzzy C-Means Clustering

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation