Optimization and improvement of data mining algorithm based on efficient incremental kernel fuzzy clustering for large data

Zhang, Cuifen; Hao, Lina; Fan, Li

doi:10.1007/s10586-018-1767-1

Optimization and improvement of data mining algorithm based on efficient incremental kernel fuzzy clustering for large data

Published: 05 March 2018

Volume 22, pages 3001–3010, (2019)
Cite this article

Cluster Computing Aims and scope Submit manuscript

376 Accesses
18 Citations
Explore all metrics

Abstract

The arrival of the big data era in the new century has made the traditional data mining algorithms unable to meet the requirements of big data mining in accuracy and efficiency. Therefore, a data mining algorithm based on efficient incremental kernel fuzzy clustering for big data was optimized—in this paper. First of all, the methods of big data mining and fuzzy clustering technique for data mining were summarized. Then, the data mining algorithm based on the incremental kernel fuzzy clustering was optimized. Finally, the method was validated by comparing with the stKFCM algorithm. The verification results showed that the improved algorithm was superior in performance and accuracy, but only a slight gap in running time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Research on Fuzzy Clustering Algorithms for Large Dimensional Data Sets Under Cloud Computing

HPC Based Scalable Logarithmic Kernelized Fuzzy Clustering Algorithms for Handling Big Data

Hadoop with Intuitionistic Fuzzy C-Means for Clustering in Big Data

References

Manogaran, G., Lopez, D.: A Gaussian process based big data processing framework in cluster computing environment. Clust. Comput. 3, 1–16 (2017)
Google Scholar
Vijayakrishnan, R., Steinhubl, S.R., Ng, K., et al.: Prevalence of heart failure signs and symptoms in a large primary care population identified through the use of text and data mining of the electronic health record. J. Cardiac Fail. 20(7), 459–464 (2014)
Article Google Scholar
Mukhopadhyay, A., Maulik, U., Bandyopadhyay, S., et al.: A survey of multiobjective evolutionary algorithms for data mining: part I. IEEE Trans. Evol. Comput. 18(1), 4–19 (2014)
Article Google Scholar
Angiulli, F., Basta, S., Lodi, S., et al.: Distributed strategies for mining outliers in large data sets. IEEE Trans. Knowl. Data Eng. 25(7), 1520–1532 (2013)
Article Google Scholar
Park, J.H., Yang, L.T., Chen, J.: Research trends in cloud, cluster and grid computing. Clust. Comput. 16(3), 335–337 (2013)
Article Google Scholar
Touw, W.G., Bayjanov, J.R., Overmars, L., et al.: Data mining in the life sciences with random forest: a walk in the park or lost in the jungle? Brief. Bioinform. 14(3), 315–326 (2013)
Article Google Scholar
Holzinger, A., Dehmer, M., Jurisica, I.: Knowledge discovery and interactive data mining in bioinformatics-state-of-the-art, future challenges and research directions. BMC Bioinform. 15(6), I1 (2014)
Article Google Scholar
Ahmed, A.B.E.D., Elaraby, I.S.: Data mining: a prediction for student’s performance using classification method. World J. Comput. Appl. Technol. 2(2), 43–47 (2014)
Google Scholar
Natek, S., Zwilling, M.: Student data mining solution—knowledge management system related to higher education institutions. Expert Syst. Appl. 41(14), 6400–6407 (2014)
Article Google Scholar
Gadet, F., Varro, G.: Guest editorial on cluster computing in the Internet. Clust. Comput. 7(1), 5 (2004)
Article Google Scholar
Yukselturk, E., Ozekes, S., Türel, Y.K.: Predicting dropout student: an application of data mining methods in an online education program. Eur. J. Open Distance E-Learn. 17(1), 118–133 (2014)
Article Google Scholar
Moghadam, A.N., Ravanmehr, R.: Multi-agent distributed data mining approach for classifying meteorology data: case study on Iran’s synoptic weather stations. Int. J. Environ. Sci. Technol. 15(11), 1–10 (2017)
Google Scholar
Sengottaian, S., Natesan, S., Mathivanan, S.: Weighted delta factor cluster ensemble algorithm for categorical data clustering in data mining. Int. Arab J. Inf. Technol. (IAJIT) 14(3), 275–284 (2017)
Google Scholar
Ronowicz, J., Thommes, M., Kleinebudde, P., et al.: A data mining approach to optimize pellets manufacturing process based on a decision tree algorithm. Eur. J. Pharm. Sci. 73, 44–48 (2015)
Article Google Scholar
Kusic, D., Kephart, J.O., Hanson, J.E., et al.: Power and performance management of virtualized computing environments via lookahead control. Clust. Comput. 12(1), 1–15 (2009)
Article Google Scholar
Khargharia, B., Hariri, S., Yousif, M.S.: Autonomic power and performance management for computing systems. Clust. Comput. 11(2), 167–181 (2008)
Article Google Scholar
Liu, M.Y., Tuzel, O., Ramalingam, S., et al.: Entropy-rate clustering: cluster analysis via maximizing a submodular function subject to a matroid constraint. IEEE Trans. Pattern Anal. Mach. Intell. 36(1), 99–112 (2014)
Article Google Scholar
Castaldi, P.J., Dy, J., Ross, J., et al.: Cluster analysis in the COPDGene study identifies subtypes of smokers with distinct patterns of airway disease and emphysema. Thorax 69(5), 415–422 (2014)
Article Google Scholar
Hung, C.C., Peng, W.C., Lee, W.C.: Clustering and aggregating clues of trajectories for mining trajectory patterns and routes. VLDB J. Int. J. Very Large DataBases 24(2), 169–192 (2015)
Article Google Scholar
Zimek, A., Vreeken, J.: The blind men and the elephant: on meeting the problem of multiple truths in data from clustering and pattern mining perspectives. Mach. Learn. 98(1–2), 121–155 (2015)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work was supported by Science and Technology Program of Colleges and Universities of Shandong province (J15LN11), China; Key projects of Education Department of Shandong Province (C2016M058), China; Key projects of Education Department of Sichuan Province (16ZA0090), China; National Social Science Fund Project (17BGL058), China; Humanity and Social Science Research Foundation of Ministry of Education (15YJA790051), China.

Author information

Authors and Affiliations

School of Information Technology, Shandong Women’s University, Jinan, 250300, China
Cuifen Zhang
College of Earth Sciences, Chengdu University of Technology (CDUT), Chengdu, 610059, China
Lina Hao
Key Laboratory of Geoscience Spatial Information Technology, Ministry of Land and Resources of the People’s Republic of China, Chengdu, 610059, China
Lina Hao
Shandong Jianzhu University, Jinan, 250101, China
Li Fan

Authors

Cuifen Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Lina Hao
View author publications
You can also search for this author inPubMed Google Scholar
Li Fan
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Lina Hao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, C., Hao, L. & Fan, L. Optimization and improvement of data mining algorithm based on efficient incremental kernel fuzzy clustering for large data. Cluster Comput 22 (Suppl 2), 3001–3010 (2019). https://doi.org/10.1007/s10586-018-1767-1

Download citation

Received: 18 September 2017
Revised: 28 October 2017
Accepted: 08 January 2018
Published: 05 March 2018
Issue Date: March 2019
DOI: https://doi.org/10.1007/s10586-018-1767-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimization and improvement of data mining algorithm based on efficient incremental kernel fuzzy clustering for large data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Research on Fuzzy Clustering Algorithms for Large Dimensional Data Sets Under Cloud Computing

HPC Based Scalable Logarithmic Kernelized Fuzzy Clustering Algorithms for Handling Big Data

Hadoop with Intuitionistic Fuzzy C-Means for Clustering in Big Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now