An elitism based self-adaptive multi-population Poor and Rich optimization algorithm for grouping similar documents

Thirumoorthy, K.; Muneeswaran, K.

doi:10.1007/s12652-021-02955-x

An elitism based self-adaptive multi-population Poor and Rich optimization algorithm for grouping similar documents

Original Research
Published: 25 February 2021

Volume 13, pages 1925–1939, (2022)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

280 Accesses
4 Citations
Explore all metrics

Abstract

In this digital era, grouping similar documents from the archives on the web is a difficult and computationally expensive task. In this paper, we propose an elitism based self-adaptive multi-population Poor and Rich optimization algorithm for grouping the similar documents, referred to as ESAMPRO. The objective function of the proposed work maximizes the accuracy and minimize the intra cluster distance. The proposed algorithm is evaluated using the various extrinsic cluster quality metrics. An in-depth analysis of the experimental results on four supervised benchmark datasets confirms that the proposed ESAMPRO algorithm outperformed the five well-known document clustering algorithms such as K-means, particle swarm optimization, whale optimization, dragonfly and grey wolf optimization algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nature Inspired Data Mining Algorithm for Document Clustering in Information Retrieval

Automatic Scientific Document Clustering Using Self-organized Multi-objective Differential Evolution

Article 19 December 2018

A Multi Criteria Document Clustering Approach Using Genetic Algorithm

Notes

References

Abualigah L, Khader AT, Hanandeh E (2017) A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J Comput Sci. https://doi.org/10.1016/j.jocs.2017.07.018
Article Google Scholar
Agarwal A, Roul RK (2018) A novel hierarchical clustering algorithm for online resources. In: Sa PK, Bakshi S, Hatzilygeroudis IK, Sahoo MN (eds) Recent findings in intelligent computing techniques. Springer Singapore, Singapore, pp 467–476
Chapter Google Scholar
Ahmadi P, Gholampour I, Tabandeh M (2017) Cluster-based sparse topical coding for topic mining and document clustering. Adv Data Anal Classif 12(3):537–558. https://doi.org/10.1007/s11634-017-0280-3
Article MathSciNet MATH Google Scholar
Akter R, Chung Y (2013) An evolutionary approach for document clustering. IERI Procedia 4:370–375. https://doi.org/10.1016/j.ieri.2013.11.053
Article Google Scholar
Baba K, Nakatoh T, Minami T (2017) Plagiarism detection using document similarity based on distributed representation. Procedia Comput Sci 111(C):3820–3870. https://doi.org/10.1016/j.procs.2017.06.038
Article Google Scholar
Boushaki SI, Kamel N, Bendjeghaba O (2018) A new quantum chaotic cuckoo search algorithm for data clustering. Expert Syst Appl 96:358–372. https://doi.org/10.1016/j.eswa.2017.12.001
Article MATH Google Scholar
Colmenares CA, Litvak M, Mantrach A, Silvestri F (2015) HEADS: Headline generation as sequence prediction using an abstract feature-rich space. In: Proceedings of the 2015 Conference of the North American chapter of the association for computational linguistics: human language technologies, association for computational linguistics, Denver, Colorado, pp 133–142, https://doi.org/10.3115/v1/N15-1014
Han J, Kamber M, Pei J (2012) Data mining concepts and techniques, third edition. Morgan Kaufmann Publishers, Waltham, Mass., http://www.amazon.de/Data-Mining-Concepts-Techniques-Management/dp/0123814790/ref=tmm_hrd_title_0?ie=UTF8&qid=1366039033&sr=1-1
Jose T, Babu SS (2019) Detecting spammers on social network through clustering technique. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-019-01541-6
Article Google Scholar
Liauw D, Khairuzzaman MQ, Syarifudin G (2019) Whale optimization algorithm for data clustering. In: 2019 7th International Conference on Cyber and IT Service Management (CITSM), vol 7, pp 1–6
Lubna Alhenak MH (2019) Genetic-frog-leaping algorithm for text document clustering. Comput Mater Continua 61(3):1045–1074. https://doi.org/10.32604/cmc.2019.08355, http://www.techscience.com/cmc/v61n3/35288
Lydia L, Govindasamy P, Lakshmanaprabu S, Ramya D (2018) Document clustering based on text mining k-means algorithm using Euclidean distance similarity. J Adv Res Dynam Control Syst 10
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, USA
Book Google Scholar
Kotouza Maria Th, Psomopoulos FE, Mitkas PA (2019) A dockerized framework for hierarchical frequency-based document clustering on cloud computing infrastructures. J Cloud Comput. https://doi.org/10.1186/s13677-019-0150-y
Article Google Scholar
Mirjalili S (2015) Dragonfly algorithm: A new meta-heuristic optimization technique for solving single-objective, discrete, and multi-objective problems. Neural Comput Appl. https://doi.org/10.1007/s00521-015-1920-1
Article Google Scholar
Mohammed A, Yusof Y, Husni H (2015) Document clustering based on firefly algorithm. J Comput Sci 11:453–465. https://doi.org/10.3844/jcssp.2015.453.465
Article Google Scholar
Moosavi S, Bardsiri VK (2019) Poor and rich optimization algorithm: a new human-based and multi populations algorithm. Eng Appl Artif Intell 86:165–181. https://doi.org/10.1016/j.engappai.2019.08.025
Article Google Scholar
Nguyen MD, Shin W (2019) An improved density-based approach to spatio-textual clustering on social media. IEEE Access 7:27217–27230
Article Google Scholar
Qian M (2014) Text-image topic discovery for web news data. Advances in information retrieval. Springer, Berlin, pp 675–680
Chapter Google Scholar
Rashaideh H, Sawaie A, Al-Betar MA, Abualigah LM, Al-laham MM, Al-Khatib RM, Braik M (2018) A grey wolf optimizer for text document clustering. J Intell Syst
Saini N, Saha S, Bhattacharyya P (2018) Automatic scientific document clustering using self-organized multi-objective differential evolution. Cognit Comput 11:271–293. https://doi.org/10.1007/s12559-018-9611-8
Article Google Scholar
Saravanan RA, Rajesh Babu M (2017) Enhanced text mining approach based on ontology for clustering research project selection. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-017-0637-7
Article Google Scholar
Sreedhar C, Kasiviswanath N, Reddy P (2017) Clustering large datasets using k-means modified inter and intra clustering (km-i2c) in hadoop. J Big Data. https://doi.org/10.1186/s40537-017-0087-2
Article Google Scholar
Vidyadhari C, Sandhya N, Premchand P (2019) Particle grey wolf optimizer (pgwo) algorithm and semantic word processing for automatic text clustering. Int J Uncertain Fuzziness Knowl Based Syst. https://doi.org/10.1142/S0218488519500090
Article Google Scholar
Wang D, Liang Y, Xu D, Feng X, Guan R (2018) A content-based recommender system for computer science publications. Knowl Based Syst 157:1–9. https://doi.org/10.1016/j.knosys.2018.05.001
Article Google Scholar
Yang H (2010) A document clustering algorithm for web search engine retrieval system. e-Education, e-Business, e-Management and e-Learning. In: International conference on 383–386. https://doi.org/10.1109/IC4E.2010.72

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their helpful comments and advice in improving this work. Also, we would like to thank the Management and Principal of Mepco Schlenk Engineering College (Autonomous), Sivakasi for providing us the state of art facilities to carry out this proposed research work in the Mepco Research Centre in collaboration with Anna University Chennai, Tamil Nadu, India.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Mepco Schlenk Engineering College, Sivakasi, 626005, India
K. Thirumoorthy & K. Muneeswaran

Authors

K. Thirumoorthy
View author publications
You can also search for this author in PubMed Google Scholar
K. Muneeswaran
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. Thirumoorthy.

Ethics declarations

Conflicts of interest

The author declares that there is no competing interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thirumoorthy, K., Muneeswaran, K. An elitism based self-adaptive multi-population Poor and Rich optimization algorithm for grouping similar documents. J Ambient Intell Human Comput 13, 1925–1939 (2022). https://doi.org/10.1007/s12652-021-02955-x

Download citation

Received: 02 August 2020
Accepted: 04 February 2021
Published: 25 February 2021
Issue Date: April 2022
DOI: https://doi.org/10.1007/s12652-021-02955-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An elitism based self-adaptive multi-population Poor and Rich optimization algorithm for grouping similar documents

Abstract

Access this article

Similar content being viewed by others

Nature Inspired Data Mining Algorithm for Document Clustering in Information Retrieval

Automatic Scientific Document Clustering Using Self-organized Multi-objective Differential Evolution

A Multi Criteria Document Clustering Approach Using Genetic Algorithm

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An elitism based self-adaptive multi-population Poor and Rich optimization algorithm for grouping similar documents

Abstract

Access this article

Similar content being viewed by others

Nature Inspired Data Mining Algorithm for Document Clustering in Information Retrieval

Automatic Scientific Document Clustering Using Self-organized Multi-objective Differential Evolution

A Multi Criteria Document Clustering Approach Using Genetic Algorithm

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation