Abstract
The arrival of the big data era in the new century has made the traditional data mining algorithms unable to meet the requirements of big data mining in accuracy and efficiency. Therefore, a data mining algorithm based on efficient incremental kernel fuzzy clustering for big data was optimized—in this paper. First of all, the methods of big data mining and fuzzy clustering technique for data mining were summarized. Then, the data mining algorithm based on the incremental kernel fuzzy clustering was optimized. Finally, the method was validated by comparing with the stKFCM algorithm. The verification results showed that the improved algorithm was superior in performance and accuracy, but only a slight gap in running time.



Similar content being viewed by others
References
Manogaran, G., Lopez, D.: A Gaussian process based big data processing framework in cluster computing environment. Clust. Comput. 3, 1–16 (2017)
Vijayakrishnan, R., Steinhubl, S.R., Ng, K., et al.: Prevalence of heart failure signs and symptoms in a large primary care population identified through the use of text and data mining of the electronic health record. J. Cardiac Fail. 20(7), 459–464 (2014)
Mukhopadhyay, A., Maulik, U., Bandyopadhyay, S., et al.: A survey of multiobjective evolutionary algorithms for data mining: part I. IEEE Trans. Evol. Comput. 18(1), 4–19 (2014)
Angiulli, F., Basta, S., Lodi, S., et al.: Distributed strategies for mining outliers in large data sets. IEEE Trans. Knowl. Data Eng. 25(7), 1520–1532 (2013)
Park, J.H., Yang, L.T., Chen, J.: Research trends in cloud, cluster and grid computing. Clust. Comput. 16(3), 335–337 (2013)
Touw, W.G., Bayjanov, J.R., Overmars, L., et al.: Data mining in the life sciences with random forest: a walk in the park or lost in the jungle? Brief. Bioinform. 14(3), 315–326 (2013)
Holzinger, A., Dehmer, M., Jurisica, I.: Knowledge discovery and interactive data mining in bioinformatics-state-of-the-art, future challenges and research directions. BMC Bioinform. 15(6), I1 (2014)
Ahmed, A.B.E.D., Elaraby, I.S.: Data mining: a prediction for student’s performance using classification method. World J. Comput. Appl. Technol. 2(2), 43–47 (2014)
Natek, S., Zwilling, M.: Student data mining solution—knowledge management system related to higher education institutions. Expert Syst. Appl. 41(14), 6400–6407 (2014)
Gadet, F., Varro, G.: Guest editorial on cluster computing in the Internet. Clust. Comput. 7(1), 5 (2004)
Yukselturk, E., Ozekes, S., Türel, Y.K.: Predicting dropout student: an application of data mining methods in an online education program. Eur. J. Open Distance E-Learn. 17(1), 118–133 (2014)
Moghadam, A.N., Ravanmehr, R.: Multi-agent distributed data mining approach for classifying meteorology data: case study on Iran’s synoptic weather stations. Int. J. Environ. Sci. Technol. 15(11), 1–10 (2017)
Sengottaian, S., Natesan, S., Mathivanan, S.: Weighted delta factor cluster ensemble algorithm for categorical data clustering in data mining. Int. Arab J. Inf. Technol. (IAJIT) 14(3), 275–284 (2017)
Ronowicz, J., Thommes, M., Kleinebudde, P., et al.: A data mining approach to optimize pellets manufacturing process based on a decision tree algorithm. Eur. J. Pharm. Sci. 73, 44–48 (2015)
Kusic, D., Kephart, J.O., Hanson, J.E., et al.: Power and performance management of virtualized computing environments via lookahead control. Clust. Comput. 12(1), 1–15 (2009)
Khargharia, B., Hariri, S., Yousif, M.S.: Autonomic power and performance management for computing systems. Clust. Comput. 11(2), 167–181 (2008)
Liu, M.Y., Tuzel, O., Ramalingam, S., et al.: Entropy-rate clustering: cluster analysis via maximizing a submodular function subject to a matroid constraint. IEEE Trans. Pattern Anal. Mach. Intell. 36(1), 99–112 (2014)
Castaldi, P.J., Dy, J., Ross, J., et al.: Cluster analysis in the COPDGene study identifies subtypes of smokers with distinct patterns of airway disease and emphysema. Thorax 69(5), 415–422 (2014)
Hung, C.C., Peng, W.C., Lee, W.C.: Clustering and aggregating clues of trajectories for mining trajectory patterns and routes. VLDB J. Int. J. Very Large DataBases 24(2), 169–192 (2015)
Zimek, A., Vreeken, J.: The blind men and the elephant: on meeting the problem of multiple truths in data from clustering and pattern mining perspectives. Mach. Learn. 98(1–2), 121–155 (2015)
Acknowledgements
This work was supported by Science and Technology Program of Colleges and Universities of Shandong province (J15LN11), China; Key projects of Education Department of Shandong Province (C2016M058), China; Key projects of Education Department of Sichuan Province (16ZA0090), China; National Social Science Fund Project (17BGL058), China; Humanity and Social Science Research Foundation of Ministry of Education (15YJA790051), China.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, C., Hao, L. & Fan, L. Optimization and improvement of data mining algorithm based on efficient incremental kernel fuzzy clustering for large data. Cluster Comput 22 (Suppl 2), 3001–3010 (2019). https://doi.org/10.1007/s10586-018-1767-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-018-1767-1