Dynamic frequency based parallel k-bat algorithm for massive data clustering (DFBPKBA)

Tripathi, Ashish Kumar; Sharma, Kapil; Bala, Manju

doi:10.1007/s13198-017-0665-x

Dynamic frequency based parallel k-bat algorithm for massive data clustering (DFBPKBA)

Original Article
Published: 06 September 2017

Volume 9, pages 866–874, (2018)
Cite this article

International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

379 Accesses
24 Citations
Explore all metrics

Abstract

In the past one decade there has been significant increase in the growth of digital data. Therefore, good data mining techniques are important for the better decision making. Clustering is one of the key element in the field of data mining. K-means is a very popular algorithm present in the literature which is widely used for the clustering purpose. However k-means algorithm suffers from the problem of stucking into local optimum solution because of it’s dependency on the random initialization of initial cluster center. In this paper a novel variant of Bat algorithm based on dynamic frequency is introduced. Further the proposed variant is hybridized with K-means to present a new approach for clustering in distributed environment. Since evolutionary computation is very computation intensive, traditional sequential algorithms are not able to provide satisfactory results within the reasonable amount of time for the large scale data problems. To mitigate this problem the proposed variant is parallelized using the MapReduce model in the Hadoop framework. The experimental results show that the proposed algorithm has outperformed K-means, PSO and Bat algorithm on eighty percent of the benchmark datasets in terms of intra-cluster distance. Further DBPKBA has also achieved significant speedup for dealing with massive datasets with increase in the number of nodes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Puma optimizer (PO): a novel metaheuristic optimization algorithm and its application in machine learning

Article 19 January 2024

Benyamin Abdollahzadeh, Nima Khodadadi, … Seyedali Mirjalili

Hybrid approaches to optimization and machine learning methods: a systematic literature review

Article Open access 24 January 2024

Beatriz Flamia Azevedo, Ana Maria A. C. Rocha & Ana I. Pereira

K-Means algorithm based on multi-feature-induced order

Article 09 April 2024

Benting Wan, Weikang Huang, … Shufen Zhou

References

Aljarah I, Ludwig SA (2013) Towards a scalable intrusion detection system based on parallel pso clustering using mapreduce. In: Proceedings of the 15th annual conference companion on Genetic and evolutionary computation, ACM, pp 169–170
Bansal JC, Sharma H, Jadon SS, Clerc M (2014) Spider monkey optimization algorithm for numerical optimization. Memet Comput 6(1):31–47
Article Google Scholar
Bhavani R, Sadasivam GS, Kumaran R (2011) A novel parallel hybrid k-means-de-aco clustering approach for genomic clustering using mapreduce. In: 2011 world congress on information and communication technologies (WICT), IEEE, pp 132–137
Blake C, Merz CJ (1998) UCI repository of machine learning databases. Department of Information and Computer Science, Irvine
Google Scholar
Cai S-J, Tsai P-W (2016) Echolocation guided evolved bat algorithm. J Inf Hiding Multimed Signal Process 7(1):153–162
Google Scholar
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Article Google Scholar
del Río S, López V, Benítez JM, Herrera F (2014) On the use of mapreduce for imbalanced big data using random forest. Inf Sci 285:112–137
Article Google Scholar
Fayyad UM, Wierse A, Grinstein GG (2002) Information visualization in data mining and knowledge discovery. Morgan Kaufmann, Burlington
Google Scholar
Forgy EW (1965) Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21:768–769
Google Scholar
Frontpage–hadoop wiki. http://wiki.apache.org/hadoop/, (Accessed on 09/17/2016)
Gong Y-J, Chen W-N, Zhan Z-H, Zhang J, Li Y, Zhang Q, Li J-J (2015) Distributed evolutionary algorithms and their models: a survey of the state-of-the-art. Appl Soft Comput 34:286–300
Article Google Scholar
Hatamlou A, Abdullah S, Nezamabadi-Pour H (2012) A combined approach for clustering based on k-means and gravitational search algorithms. Swarm Evolut Comput 6:47–52
Article Google Scholar
Jadon SS, Bansal JC, Tiwari R, Sharma H (2014) Artificial bee colony algorithm with global and local neighborhoods. Int J Syst Assur Eng Manag 1–13
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666
Article Google Scholar
Jansen BJ, Zhang M, Sobel K, Chowdury A (2009) Twitter power: tweets as electronic word of mouth. J Am Soc Inform Sci Technol 60(11):2169–2188
Article Google Scholar
Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis. Wiley, New York
MATH Google Scholar
Khezr SN, Navimipour NJ (2015) Mapreduce and its application in optimization algorithms: a comprehensive study. Majlesi J Multimed Process 4(3)
Lin K-Y, Xu L-H, Wu J-H (2004) A fast fuzzy c-means clustering for color image segmentation. J Image Gr 2:005
Google Scholar
Lin C-Y, Pai Y-M, Tsai K-H, Wen CH-P, Wang L-C (2013) Parallelizing modified cuckoo search on mapreduce architecture. J Electr Sci Technol 11(2):115–123
Google Scholar
Ma J, Gao W, Mitra P, Kwon S, Jansen BJ, Wong K-F, Cha M (2016) Detecting rumors from microblogs with recurrent neural networks. In: IJCAI, pp 3818–3824
Meena MJ, Chandran K, Karthik A, Samuel AV (2012) An enhanced aco algorithm to select features for text categorization and its parallelization. Expert Syst Appl 39(5):5861–5871
Article Google Scholar
Moertini VS, Venica L (2016) Enhancing parallel k-means using map reduce for discovering knowledge from big data. In: 2016 IEEE international conference on cloud computing and big data analysis (ICCCBDA), IEEE, pp 81–87
Nguyen T, Pan J, Chu S, Roddick JF, Dao TK (2016) Optimization localization in wireless sensor network based on multi-objective firefly algorithm. J Netw Intell 1(4):130–138
Google Scholar
Sharma K, Chhamunya V, Gupta P, Sharma H, Bansal JC (2015) Fitness based particle swarm optimization. Int J Syst Assur Eng Manag 6(3):319–329
Article Google Scholar
Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: 2010 IEEE 26th symposium on mass storage systems and technologies (MSST). IEEE, 1–10
Tsai P-W, Zhang J, Zhang S, Istanda V, Liao L-C, Pan J-S (2015) Improving swarm intelligence accuracy with cosine functions for evolved bat algorithm. J Inf Hiding Multimed Signal Process 6:1194–1202
Google Scholar
Tsai PW, Zhang J, Liu Y, He Y, Zhang S, Pan J-S (2016) Undulating swarm intelligence agents in wave increasing evolved bat algorithm. J Inf Hiding Multimed Signal Process 7(1):21–30
Google Scholar
Verma A, Llorà X, Goldberg DE, Campbell RH (2009) Scaling genetic algorithms using mapreduce. In: 2009 ninth international conference on intelligent systems design and applications, IEEE, pp 13–18
Wang J, Yuan D, Jiang M (2012) Parallel k-pso based on mapreduce. In: 2012 IEEE 14th international conference on communication technology (ICCT), IEEE, pp 1203–1208
Wu B, Wu G, Yang M (2012) A mapreduce based ant colony optimization approach to combinatorial optimization problems. In: 2012 eighth international conference on natural computation (ICNC), IEEE, pp 728–732
Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16:645–678
Article Google Scholar
Xu X, Ji Z, Yuan F, Liu X (2014) A novel parallel approach of cuckoo search using mapreduce. In: 2014 international conference on computer, communications and information technology (CCIT 2014), Atlantis Press
Yang X-S (2010) A new metaheuristic bat-inspired algorithm. In: Nature inspired cooperative strategies for optimization (NICSO 2010), Springer, pp 65–74
Yang X-S, He X (2013) Bat algorithm: literature review and applications. Int J Bio-Inspired Comput 5(3):141–149
Article Google Scholar
Yang S, Wu R, Wang M, Jiao L (2010) Evolutionary clustering based vector quantization and spiht coding for image compression. Pattern Recogn Lett 31(13):1773–1780
Article Google Scholar
You Z-H, Yu J-Z, Zhu L, Li S, Wen Z-K (2014) A mapreduce based parallel svm for large-scale predicting protein-protein interactions. Neurocomputing 145:37–43
Article Google Scholar

Download references

Author information

Authors and Affiliations

Delhi Technological University, Delhi, India
Ashish Kumar Tripathi & Kapil Sharma
IP College of Women, Delhi, India
Manju Bala

Authors

Ashish Kumar Tripathi
View author publications
You can also search for this author in PubMed Google Scholar
Kapil Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Manju Bala
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kapil Sharma.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tripathi, A.K., Sharma, K. & Bala, M. Dynamic frequency based parallel k-bat algorithm for massive data clustering (DFBPKBA). Int J Syst Assur Eng Manag 9, 866–874 (2018). https://doi.org/10.1007/s13198-017-0665-x

Download citation

Received: 22 March 2017
Revised: 12 June 2017
Published: 06 September 2017
Issue Date: August 2018
DOI: https://doi.org/10.1007/s13198-017-0665-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Dynamic frequency based parallel k-bat algorithm for massive data clustering (DFBPKBA)

Abstract

Access this article

Similar content being viewed by others

Puma optimizer (PO): a novel metaheuristic optimization algorithm and its application in machine learning

Hybrid approaches to optimization and machine learning methods: a systematic literature review

K-Means algorithm based on multi-feature-induced order

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dynamic frequency based parallel k-bat algorithm for massive data clustering (DFBPKBA)

Abstract

Access this article

Similar content being viewed by others

Puma optimizer (PO): a novel metaheuristic optimization algorithm and its application in machine learning

Hybrid approaches to optimization and machine learning methods: a systematic literature review

K-Means algorithm based on multi-feature-induced order

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation