Skip to main content
Log in

Elephant search algorithm applied to data clustering

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Data clustering is one of the most popular branches of machine learning and data analysis. Partitioning-based type of clustering algorithms, such as K-means, is prone to the problem of producing a set of clusters that is far from perfect due to its probabilistic nature. The clustering process starts with some random partitions at the beginning, and then it attempts to improve the partitions progressively. Different initial partitions can result in different final clusters. Trying through all the possible candidate clusters for the perfect result is computationally expensive. Meta-heuristic algorithm aims to search for global optimum in high-dimensional problems. Meta-heuristic algorithm has been successfully implemented on data clustering problems seeking a near optimal solution in terms of quality of the resultant clusters. In this paper, a new meta-heuristic search method named elephant search algorithm (ESA) is proposed to integrate into K-means, forming a new data clustering algorithm, namely C-ESA. The advantage of C-ESA is its dual features of (i) evolutionary operations and (ii) balance of local intensification and global exploration. The results by C-ESA are compared with classical clustering algorithms including K-means, DBSCAN, and GMM-EM. C-ESA is shown to outperform the other algorithms in terms of clustering accuracy via a computer simulation. C-ESA is also implemented on time series clustering compared with classical algorithms K-means, Fuzzy C-means and classical meta-heuristic algorithm PSO. C-ESA outperforms the other algorithms in term of clustering accuracy. C-ESA is still comparable compared with state of art time series clustering algorithm K-shape.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  • Aghabozorgi S, Shirkhorshidi AS, Wah TY (2015) Time-series clustering-a decade review. Inf Syst 53:16–38

    Article  Google Scholar 

  • Agrawal R, Faloutsos C, Swami A (1993) Efficient similarity search in sequence databases. In: International conference on foundations of data organization and algorithms 69–84. Springer, Berlin Heidelberg

  • Bao D (2008) A generalized model for financial time series representation and prediction. Appl Intell 29(1):1–11

    Article  MathSciNet  Google Scholar 

  • Batista GE, Wang X, Keogh EJ (2011) A complexity-invariant distance measure for time series. In: SDM 11: 699-710

  • Beheshti Z, Shamsuddin SMH (2013) A review of population-based meta-heuristic algorithms. Int J Adv Soft Comput Appl 5:1–35

    Google Scholar 

  • Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203

    Article  Google Scholar 

  • Bilmes JA (1998) A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Int Comput Sci Inst 4:126

    Google Scholar 

  • Chan K-P, Fu AW-C (1999) Efficient time series matching by wavelets. In: Data engineering, 1999. Proceedings., 15th international conference on 126–133. IEEE

  • Chen Y, Keogh E, Hu B, Begum N, Bagnall A, Mueen A, Batista G (2015) The UCR time series classification archive URL www.cs.ucr.edu/~eamonn/time_series_data/

  • Corduas M, Piccolo D (2008) Time series clustering and classification by the autoregressive metric. Comput Stat Data Anal 52(4):1860–1872

    Article  MathSciNet  MATH  Google Scholar 

  • Deb K et al (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evolut Comput 6:182–197

    Article  Google Scholar 

  • Deb S, Fong S, Tian Z (2015) Elephant search algorithm for optimization problems. In: Digital information management (ICDIM), 2015 tenth international conference on, IEEE

  • Elangasinghe MA, Singhal N, Dirks KN, Salmond JA, Samarasinghe S (2014) Complex time series analysis of PM 10 and PM 2.5 for a coastal site using artificial neural network modelling and k-means clustering. Atmos Environ 94:106–116

    Article  Google Scholar 

  • Ester M et al (1996) Density-based spatial clustering of applications with noise. In: International conference on knowledge discovery and data mining 240

  • Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases 23.:2. ACM

  • Gross O (2005) Three myths about dynamic time warping data mining

  • Guam H-S, Jiang Q-S (2007) Cluster financial time series for portfolio. In: 2007 international conference on wavelet analysis and pattern recognition 2:851-856. IEEE

  • Huang Z (1998) Extensions to the k-means algorithm for clustering large datasets with categorical values. Data Min Knowl Discov 2:283–304

    Article  Google Scholar 

  • Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31:651–666

    Article  Google Scholar 

  • Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv (CSUR) 31:264–323

    Article  Google Scholar 

  • Jin X et al (2015) Significance and challenges of big data research. Big Data Res 2:59–64

    Article  Google Scholar 

  • Kawagoe K, Ueda T (2002) A similarity search method of time series data with combination of Fourier and wavelet transforms. In: Temporal representation and reasoning, 2002. TIME 2002. Proceedings. Ninth international symposium on 6–92. IEEE

  • Kennedy J (2011) Particle swarm optimization. Encyclopedia of machine learning. Springer, Berlin, pp 760–766

    Google Scholar 

  • Kumar M, Patel NR, Woo J (2002) Clustering seasonality patterns in the presence of errors. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM

  • LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444

    Article  Google Scholar 

  • Maekawa M et al (2012) Operating System. In: Distributed environments: software paradigms and workstations, pp 259

  • Maulik U, Bandyopadhyay S (2000) Genetic algorithm-based clustering technique. Pattern Recognit 33:1455–1465

    Article  Google Scholar 

  • Oates T, Firoiu L, Cohen PR (2000) Using dynamic time warping to bootstrap HMM-based clustering of time series. In: Sequence learning. Springer, Berlin, Heidelberg, pp 35–52

  • Paparrizos J, Gravano L (2015) k-Shape: efficient and accurate clustering of time series. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, ACM, pp 1855–1870

  • Rakthanmanon T et al (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM

  • Scheunders P (1997) A genetic c-means clustering algorithm applied to color image quantization. Pattern Recognit 30:859–866

    Article  Google Scholar 

  • Sibson R (1973) SLINK: an optimally efficient algorithm for the single-link cluster method. Comput J 16:30–34

    Article  MathSciNet  Google Scholar 

  • Subhani N, Rueda L, Ngom A, Burden CJ (2010) Multiple gene expression profile alignment for microarray time-series data clustering. Bioinformatics 26(18):2281–2288

    Article  Google Scholar 

  • Tang R et al (2012) Integrating nature-inspired optimization algorithms to K-means clustering. In: Digital information management (ICDIM), 2012 seventh international conference on IEEE

  • Tang R et al (2012) Wolf search algorithm with ephemeral memory. In: Digital information management (ICDIM). 2012 seventh international conference on. IEEE

  • UCI Machine Learning Data Archive http://archive.ics.uci.edu/m

  • Van der Merwe DW, Engelbrecht AP (2003) Data clustering using particle swarm optimization. In: Evolutionary computation, 2003. CEC’03. The 2003 congress on. Vol. 1. IEEE

  • Vlachos M, Lin J, Keogh E, Gunopulos D (2003) A wavelet-based anytime algorithm for k-means clustering of time series. In: In Proc, workshop on clustering high dimensionality data and its applications

  • Yang X-S, Deb S (2009) Cuckoo search via Lévy flights. Nature and biologically inspired computing, 2009. NaBIC 2009. In: IEEE Proceedings on world congress

  • Yang X-S (2010) Firefly algorithm, stochastic test functions and design optimisation. Int J Bio Inspir Comput 2:78–84

    Article  Google Scholar 

  • Yang X-S (2011) Review of meta-heuristics and generalised evolutionary walk algorithm. Int J Bio Inspir Comput 3:77–84

    Article  Google Scholar 

  • Zhang YJ (1996) A survey on evaluation methods for image segmentation. Pattern Recognit 29:1335–1346

    Article  Google Scholar 

Download references

Acknowledgements

The authors are grateful for financial support from the research Grants (1) ‘Nature-Inspired Computing and Metaheuristics Algorithms for Optimizing Data Mining Performance’ from the University of Macau (Grant No. MYRG2016-00069-FST); (2) ‘Temporal Data Stream Mining by Using Incrementally Optimized Very Fast Decision Forest (iOVFDF)’, which are offered by the University of Macau, (Grant No. MYRG2015-00128-FST); and (3) ‘A Scalable Data Stream Mining Methodology: Stream-based Holistic Analytics and Reasoning in Parallel’, from FDCT, Macau SAR government (Grant No. FDCT/126/2014/A3).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kelvin K. L. Wong.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by S. Deb, T. Hanne, K.C. Wong.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Deb, S., Tian, Z., Fong, S. et al. Elephant search algorithm applied to data clustering. Soft Comput 22, 6035–6046 (2018). https://doi.org/10.1007/s00500-018-3076-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-018-3076-2

Keywords

Navigation