Skip to main content
Log in

Dynamic frequency based parallel k-bat algorithm for massive data clustering (DFBPKBA)

  • Original Article
  • Published:
International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

Abstract

In the past one decade there has been significant increase in the growth of digital data. Therefore, good data mining techniques are important for the better decision making. Clustering is one of the key element in the field of data mining. K-means is a very popular algorithm present in the literature which is widely used for the clustering purpose. However k-means algorithm suffers from the problem of stucking into local optimum solution because of it’s dependency on the random initialization of initial cluster center. In this paper a novel variant of Bat algorithm based on dynamic frequency is introduced. Further the proposed variant is hybridized with K-means to present a new approach for clustering in distributed environment. Since evolutionary computation is very computation intensive, traditional sequential algorithms are not able to provide satisfactory results within the reasonable amount of time for the large scale data problems. To mitigate this problem the proposed variant is parallelized using the MapReduce model in the Hadoop framework. The experimental results show that the proposed algorithm has outperformed K-means, PSO and Bat algorithm on eighty percent of the benchmark datasets in terms of intra-cluster distance. Further DBPKBA has also achieved significant speedup for dealing with massive datasets with increase in the number of nodes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Aljarah I, Ludwig SA (2013) Towards a scalable intrusion detection system based on parallel pso clustering using mapreduce. In: Proceedings of the 15th annual conference companion on Genetic and evolutionary computation, ACM, pp 169–170

  • Bansal JC, Sharma H, Jadon SS, Clerc M (2014) Spider monkey optimization algorithm for numerical optimization. Memet Comput 6(1):31–47

    Article  Google Scholar 

  • Bhavani R, Sadasivam GS, Kumaran R (2011) A novel parallel hybrid k-means-de-aco clustering approach for genomic clustering using mapreduce. In: 2011 world congress on information and communication technologies (WICT), IEEE, pp 132–137

  • Blake C, Merz CJ (1998) UCI repository of machine learning databases. Department of Information and Computer Science, Irvine

    Google Scholar 

  • Cai S-J, Tsai P-W (2016) Echolocation guided evolved bat algorithm. J Inf Hiding Multimed Signal Process 7(1):153–162

    Google Scholar 

  • Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  • del Río S, López V, Benítez JM, Herrera F (2014) On the use of mapreduce for imbalanced big data using random forest. Inf Sci 285:112–137

    Article  Google Scholar 

  • Fayyad UM, Wierse A, Grinstein GG (2002) Information visualization in data mining and knowledge discovery. Morgan Kaufmann, Burlington

    Google Scholar 

  • Forgy EW (1965) Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21:768–769

    Google Scholar 

  • Frontpage–hadoop wiki. http://wiki.apache.org/hadoop/, (Accessed on 09/17/2016)

  • Gong Y-J, Chen W-N, Zhan Z-H, Zhang J, Li Y, Zhang Q, Li J-J (2015) Distributed evolutionary algorithms and their models: a survey of the state-of-the-art. Appl Soft Comput 34:286–300

    Article  Google Scholar 

  • Hatamlou A, Abdullah S, Nezamabadi-Pour H (2012) A combined approach for clustering based on k-means and gravitational search algorithms. Swarm Evolut Comput 6:47–52

    Article  Google Scholar 

  • Jadon SS, Bansal JC, Tiwari R, Sharma H (2014) Artificial bee colony algorithm with global and local neighborhoods. Int J Syst Assur Eng Manag 1–13

  • Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666

    Article  Google Scholar 

  • Jansen BJ, Zhang M, Sobel K, Chowdury A (2009) Twitter power: tweets as electronic word of mouth. J Am Soc Inform Sci Technol 60(11):2169–2188

    Article  Google Scholar 

  • Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis. Wiley, New York

    MATH  Google Scholar 

  • Khezr SN, Navimipour NJ (2015) Mapreduce and its application in optimization algorithms: a comprehensive study. Majlesi J Multimed Process 4(3)

  • Lin K-Y, Xu L-H, Wu J-H (2004) A fast fuzzy c-means clustering for color image segmentation. J Image Gr 2:005

    Google Scholar 

  • Lin C-Y, Pai Y-M, Tsai K-H, Wen CH-P, Wang L-C (2013) Parallelizing modified cuckoo search on mapreduce architecture. J Electr Sci Technol 11(2):115–123

    Google Scholar 

  • Ma J, Gao W, Mitra P, Kwon S, Jansen BJ, Wong K-F, Cha M (2016) Detecting rumors from microblogs with recurrent neural networks. In: IJCAI, pp 3818–3824

  • Meena MJ, Chandran K, Karthik A, Samuel AV (2012) An enhanced aco algorithm to select features for text categorization and its parallelization. Expert Syst Appl 39(5):5861–5871

    Article  Google Scholar 

  • Moertini VS, Venica L (2016) Enhancing parallel k-means using map reduce for discovering knowledge from big data. In: 2016 IEEE international conference on cloud computing and big data analysis (ICCCBDA), IEEE, pp 81–87

  • Nguyen T, Pan J, Chu S, Roddick JF, Dao TK (2016) Optimization localization in wireless sensor network based on multi-objective firefly algorithm. J Netw Intell 1(4):130–138

    Google Scholar 

  • Sharma K, Chhamunya V, Gupta P, Sharma H, Bansal JC (2015) Fitness based particle swarm optimization. Int J Syst Assur Eng Manag 6(3):319–329

    Article  Google Scholar 

  • Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: 2010 IEEE 26th symposium on mass storage systems and technologies (MSST). IEEE, 1–10

  • Tsai P-W, Zhang J, Zhang S, Istanda V, Liao L-C, Pan J-S (2015) Improving swarm intelligence accuracy with cosine functions for evolved bat algorithm. J Inf Hiding Multimed Signal Process 6:1194–1202

    Google Scholar 

  • Tsai PW, Zhang J, Liu Y, He Y, Zhang S, Pan J-S (2016) Undulating swarm intelligence agents in wave increasing evolved bat algorithm. J Inf Hiding Multimed Signal Process 7(1):21–30

    Google Scholar 

  • Verma A, Llorà X, Goldberg DE, Campbell RH (2009) Scaling genetic algorithms using mapreduce. In: 2009 ninth international conference on intelligent systems design and applications, IEEE, pp 13–18

  • Wang J, Yuan D, Jiang M (2012) Parallel k-pso based on mapreduce. In: 2012 IEEE 14th international conference on communication technology (ICCT), IEEE, pp 1203–1208

  • Wu B, Wu G, Yang M (2012) A mapreduce based ant colony optimization approach to combinatorial optimization problems. In: 2012 eighth international conference on natural computation (ICNC), IEEE, pp 728–732

  • Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16:645–678

    Article  Google Scholar 

  • Xu X, Ji Z, Yuan F, Liu X (2014) A novel parallel approach of cuckoo search using mapreduce. In: 2014 international conference on computer, communications and information technology (CCIT 2014), Atlantis Press

  • Yang X-S (2010) A new metaheuristic bat-inspired algorithm. In: Nature inspired cooperative strategies for optimization (NICSO 2010), Springer, pp 65–74

  • Yang X-S, He X (2013) Bat algorithm: literature review and applications. Int J Bio-Inspired Comput 5(3):141–149

    Article  Google Scholar 

  • Yang S, Wu R, Wang M, Jiao L (2010) Evolutionary clustering based vector quantization and spiht coding for image compression. Pattern Recogn Lett 31(13):1773–1780

    Article  Google Scholar 

  • You Z-H, Yu J-Z, Zhu L, Li S, Wen Z-K (2014) A mapreduce based parallel svm for large-scale predicting protein-protein interactions. Neurocomputing 145:37–43

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kapil Sharma.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tripathi, A.K., Sharma, K. & Bala, M. Dynamic frequency based parallel k-bat algorithm for massive data clustering (DFBPKBA). Int J Syst Assur Eng Manag 9, 866–874 (2018). https://doi.org/10.1007/s13198-017-0665-x

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13198-017-0665-x

Keywords

Navigation