Abstract
In this digital era, grouping similar documents from the archives on the web is a difficult and computationally expensive task. In this paper, we propose an elitism based self-adaptive multi-population Poor and Rich optimization algorithm for grouping the similar documents, referred to as ESAMPRO. The objective function of the proposed work maximizes the accuracy and minimize the intra cluster distance. The proposed algorithm is evaluated using the various extrinsic cluster quality metrics. An in-depth analysis of the experimental results on four supervised benchmark datasets confirms that the proposed ESAMPRO algorithm outperformed the five well-known document clustering algorithms such as K-means, particle swarm optimization, whale optimization, dragonfly and grey wolf optimization algorithm.
Similar content being viewed by others
References
Abualigah L, Khader AT, Hanandeh E (2017) A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J Comput Sci. https://doi.org/10.1016/j.jocs.2017.07.018
Agarwal A, Roul RK (2018) A novel hierarchical clustering algorithm for online resources. In: Sa PK, Bakshi S, Hatzilygeroudis IK, Sahoo MN (eds) Recent findings in intelligent computing techniques. Springer Singapore, Singapore, pp 467–476
Ahmadi P, Gholampour I, Tabandeh M (2017) Cluster-based sparse topical coding for topic mining and document clustering. Adv Data Anal Classif 12(3):537–558. https://doi.org/10.1007/s11634-017-0280-3
Akter R, Chung Y (2013) An evolutionary approach for document clustering. IERI Procedia 4:370–375. https://doi.org/10.1016/j.ieri.2013.11.053
Baba K, Nakatoh T, Minami T (2017) Plagiarism detection using document similarity based on distributed representation. Procedia Comput Sci 111(C):3820–3870. https://doi.org/10.1016/j.procs.2017.06.038
Boushaki SI, Kamel N, Bendjeghaba O (2018) A new quantum chaotic cuckoo search algorithm for data clustering. Expert Syst Appl 96:358–372. https://doi.org/10.1016/j.eswa.2017.12.001
Colmenares CA, Litvak M, Mantrach A, Silvestri F (2015) HEADS: Headline generation as sequence prediction using an abstract feature-rich space. In: Proceedings of the 2015 Conference of the North American chapter of the association for computational linguistics: human language technologies, association for computational linguistics, Denver, Colorado, pp 133–142, https://doi.org/10.3115/v1/N15-1014
Han J, Kamber M, Pei J (2012) Data mining concepts and techniques, third edition. Morgan Kaufmann Publishers, Waltham, Mass., http://www.amazon.de/Data-Mining-Concepts-Techniques-Management/dp/0123814790/ref=tmm_hrd_title_0?ie=UTF8&qid=1366039033&sr=1-1
Jose T, Babu SS (2019) Detecting spammers on social network through clustering technique. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-019-01541-6
Liauw D, Khairuzzaman MQ, Syarifudin G (2019) Whale optimization algorithm for data clustering. In: 2019 7th International Conference on Cyber and IT Service Management (CITSM), vol 7, pp 1–6
Lubna Alhenak MH (2019) Genetic-frog-leaping algorithm for text document clustering. Comput Mater Continua 61(3):1045–1074. https://doi.org/10.32604/cmc.2019.08355, http://www.techscience.com/cmc/v61n3/35288
Lydia L, Govindasamy P, Lakshmanaprabu S, Ramya D (2018) Document clustering based on text mining k-means algorithm using Euclidean distance similarity. J Adv Res Dynam Control Syst 10
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, USA
Kotouza Maria Th, Psomopoulos FE, Mitkas PA (2019) A dockerized framework for hierarchical frequency-based document clustering on cloud computing infrastructures. J Cloud Comput. https://doi.org/10.1186/s13677-019-0150-y
Mirjalili S (2015) Dragonfly algorithm: A new meta-heuristic optimization technique for solving single-objective, discrete, and multi-objective problems. Neural Comput Appl. https://doi.org/10.1007/s00521-015-1920-1
Mohammed A, Yusof Y, Husni H (2015) Document clustering based on firefly algorithm. J Comput Sci 11:453–465. https://doi.org/10.3844/jcssp.2015.453.465
Moosavi S, Bardsiri VK (2019) Poor and rich optimization algorithm: a new human-based and multi populations algorithm. Eng Appl Artif Intell 86:165–181. https://doi.org/10.1016/j.engappai.2019.08.025
Nguyen MD, Shin W (2019) An improved density-based approach to spatio-textual clustering on social media. IEEE Access 7:27217–27230
Qian M (2014) Text-image topic discovery for web news data. Advances in information retrieval. Springer, Berlin, pp 675–680
Rashaideh H, Sawaie A, Al-Betar MA, Abualigah LM, Al-laham MM, Al-Khatib RM, Braik M (2018) A grey wolf optimizer for text document clustering. J Intell Syst
Saini N, Saha S, Bhattacharyya P (2018) Automatic scientific document clustering using self-organized multi-objective differential evolution. Cognit Comput 11:271–293. https://doi.org/10.1007/s12559-018-9611-8
Saravanan RA, Rajesh Babu M (2017) Enhanced text mining approach based on ontology for clustering research project selection. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-017-0637-7
Sreedhar C, Kasiviswanath N, Reddy P (2017) Clustering large datasets using k-means modified inter and intra clustering (km-i2c) in hadoop. J Big Data. https://doi.org/10.1186/s40537-017-0087-2
Vidyadhari C, Sandhya N, Premchand P (2019) Particle grey wolf optimizer (pgwo) algorithm and semantic word processing for automatic text clustering. Int J Uncertain Fuzziness Knowl Based Syst. https://doi.org/10.1142/S0218488519500090
Wang D, Liang Y, Xu D, Feng X, Guan R (2018) A content-based recommender system for computer science publications. Knowl Based Syst 157:1–9. https://doi.org/10.1016/j.knosys.2018.05.001
Yang H (2010) A document clustering algorithm for web search engine retrieval system. e-Education, e-Business, e-Management and e-Learning. In: International conference on 383–386. https://doi.org/10.1109/IC4E.2010.72
Acknowledgements
We would like to thank the anonymous reviewers for their helpful comments and advice in improving this work. Also, we would like to thank the Management and Principal of Mepco Schlenk Engineering College (Autonomous), Sivakasi for providing us the state of art facilities to carry out this proposed research work in the Mepco Research Centre in collaboration with Anna University Chennai, Tamil Nadu, India.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The author declares that there is no competing interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Thirumoorthy, K., Muneeswaran, K. An elitism based self-adaptive multi-population Poor and Rich optimization algorithm for grouping similar documents. J Ambient Intell Human Comput 13, 1925–1939 (2022). https://doi.org/10.1007/s12652-021-02955-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-021-02955-x