Abstract
Data aggregation from different databases into a data warehouse creates multidimensional data such as data cubes. With regard to the 3D structure of data, data cube clustering has significant challenges to perform on data cube. In this paper, new preprocessing techniques and a novel hybridization of DBSCAN and fuzzy earthworm optimization algorithm (EWOA) are proposed to solve the challenges. Proposed preprocessing consists of an assigned address to each cube cell and dimension move to create a related 2D data from the data cube and new similarity metric. The DBSCAN algorithm, as a density-based clustering algorithm, is adopted based on both Euclidean and newly proposed similarity metric, which are called DBSCAN1 and DBSCAN2 for the related 2D data. A new hybridization of the EWOA and DBSCAN is proposed to improve the DBSCAN, and it is called EWOA–DBSCAN. Also, to dynamically tune parameters of EWOA, a fuzzy logic controller is designed with two fuzzy group rules of Mamdani (EWOA–DBSCAN-Mamdani) and Sugeno (EWOA–DBSCAN-Sugeno), separately. These ideas are proposed to present efficient and flexible unsupervised analysis for a data cube by utilizing a meta-heuristic algorithm to optimize DBSCAN’s parameters and increasing the efficiency of the idea by applying dynamic tuning parameters of the algorithm. To evaluate the efficiency, the proposed algorithms are compared with DBSCAN1 and GA-DBSCAN1, GA-DBSCAN1-Mamdani and GA-DBSCAN1-Sugeno. The experimental results, consisting of 20 runs, indicate that the proposed ideas achieved their targets.












Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Angelova M, Pencheva T (2011) Tuning genetic algorithm parameters to improve convergence time. Int J Chem Eng 2011:1–7
Aydilek IB, Arslan A (2013) A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm. Inf Sci 233:25–35
Berkhin P (2006) A survey of clustering data mining techniques. In: Kogan J, Nicholas C, Teboulle M (eds) Grouping multidimensional data. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-28349-8_2
Bezdek JC, Pal NR (1995) Cluster validation with generalized Dunn’s indices. In: Proceedings 1995 second New Zealand international two-stream conference on artificial neural networks and expert systems. IEEE
Carvalho DR, Freitas AA (2004) A hybrid decision tree/genetic algorithm method for data mining. Inf Sci 163(1):13–35
Ceci M, Cuzzocrea A, Malerba D (2015) Effectively and efficiently supporting roll-up and drill-down OLAP operations over continuous dimensions via hierarchical clustering. J Intell Inf Syst 44(3):309–333
Chaudhuri S, Dayal U (1997) An overview of data warehousing and OLAP technology. ACM Sigmod Rec 26(1):65–74
Chen J (2012) Hybrid clustering algorithm based on PSO with the multidimensional asynchronism and stochastic disturbance method. J Theor Appl Inf Technol 46(1):434–440
Cheng T (2017) An improved DBSCAN clustering algorithm for multi-density datasets. In: Proceedings of the 2nd international conference on intelligent information processing. ACM
Darong H, Peng W (2012) Grid-based DBSCAN algorithm with referential parameters. Phys Procedia 24:1166–1170
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 2:224–227
Freitas AA (2003) A survey of evolutionary algorithms for data mining and knowledge discovery. In: Ghosh A, Tsutsui S (eds) Advances in evolutionary computing. Natural Computing Series, Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-18965-4_33
Gnanapriya S et al (2010) Data mining concepts and techniques. Data Min Knowl Eng 2(9):256–263
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam
Hema R, Malik N (2010) Data mining and business intelligence. In: Proceedings of the 4th national conference
Herrera F, Lozano M (2003) Fuzzy adaptive genetic algorithms: design, taxonomy, and future directions. Soft Comput 7(8):545–562
Huang Z (1997) A fast clustering algorithm to cluster very large categorical data sets in data mining. DMKD 3(8):34–39
Johnson RJ, Williams JP, Bauer KW (2013) AutoGAD: an improved ICA-based hyperspectral anomaly detection algorithm. IEEE Trans Geosci Remote Sens 51(6):3492–3503
Joshi A, Kaur R (2013) A review: comparative study of various clustering techniques in data mining. Int J Adv Res Comput Sci Softw Eng 3(3)
Karafotias G, Hoogendoorn M, Eiben ÁE (2015) Parameter control in evolutionary algorithms: trends and challenges. IEEE Trans Evol Comput 19(2):167–187
Karami A, Johansson R (2014) Choosing DBSCAN parameters automatically using differential evolution. Int J Comput Appl 91(7):1–11
Kumar KM, Reddy ARM (2016) A fast DBSCAN clustering algorithm by accelerating neighbor searching using Groups method. Pattern Recognit 58:39–48
Liço L (2017) Data mining techniques in database systems
Liu J, Lampinen J (2005) A fuzzy adaptive differential evolution algorithm. Soft Comput 9(6):448–462
Mamdani EH, Assilian S (1975) An experiment in linguistic synthesis with a fuzzy logic controller. Int J Man Mach Stud 7(1):1–13
Mining WID (2006) Data mining: concepts and techniques. Morgan Kaufinann, Burlington
Nagar P, Srivastava S (2008) Application of genetic algorithms in data mining. In: 2nd National conference on challenges and opportunities in information technology
Pei Z, Hua X, Han J (2008) The clustering algorithm based on particle swarm optimization algorithm. In: 2008 International conference on intelligent computation technology and automation (ICICTA). IEEE
Pujari AK (2001) Data mining techniques. Universities Press, Cambridge
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Scovanner P, Ali S, Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th ACM international conference on multimedia. ACM
Smiti A, Eloudi Z (2012) DBSCAN-GM: An improved clustering method based on Gaussian means and DBSCAN techniques. In: 2012 IEEE 16th International conference on intelligent engineering systems (INES). IEEE
Smiti A, Eloudi Z (2013) Soft DBSCAN: improving DBSCAN clustering method using fuzzy set theory. In: 2013 The 6th international conference on human system interaction (HSI). IEEE
Takagi T, Sugeno M (1993) Fuzzy identification of systems and its applications to modeling and control. In: Kozma R (ed) Readings in fuzzy sets for intelligent systems. IEEE Transactions on Systems, Man, and Cybernetics: Systems, pp 387–403
Vercellis C (2011) Business intelligence: data mining and optimization for decision making. Wiley, New York
Wang G-G, Deb S, dos Santos Coelho L (2018) Earthworm optimisation algorithm: a bio-inspired metaheuristic algorithm for global optimisation problems. IJBIC 12(1):1–22
Woo HJ, Joo KH, Park NH (2015) A clustering OLAP analysis in a big data stream environment
Zhao Y-Q, Yang J (2015) Hyperspectral image denoising via sparse representation and low-rank constraint. IEEE Trans Geosci Remote Sens 53(1):296–308
Zhao B et al (2007) Image segmentation based on ant colony optimization and K-means clustering. In: 2007 IEEE International conference on automation and logistics. IEEE
Funding
The study is not funded by any agency.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors do hereby declare that there is no conflict of interests of other works regarding the publication of this paper.
Ethical approval
The manuscript does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
See Tables 6, 7, 8, 9, 10, 11, 12 and 13.
Rights and permissions
About this article
Cite this article
Hosseini Rad, M., Abdolrazzagh-Nezhad, M. A new hybridization of DBSCAN and fuzzy earthworm optimization algorithm for data cube clustering. Soft Comput 24, 15529–15549 (2020). https://doi.org/10.1007/s00500-020-04881-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-020-04881-0