Skip to main content

Advertisement

Log in

Optimization and improvement of data mining algorithm based on efficient incremental kernel fuzzy clustering for large data

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

The arrival of the big data era in the new century has made the traditional data mining algorithms unable to meet the requirements of big data mining in accuracy and efficiency. Therefore, a data mining algorithm based on efficient incremental kernel fuzzy clustering for big data was optimized—in this paper. First of all, the methods of big data mining and fuzzy clustering technique for data mining were summarized. Then, the data mining algorithm based on the incremental kernel fuzzy clustering was optimized. Finally, the method was validated by comparing with the stKFCM algorithm. The verification results showed that the improved algorithm was superior in performance and accuracy, but only a slight gap in running time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Manogaran, G., Lopez, D.: A Gaussian process based big data processing framework in cluster computing environment. Clust. Comput. 3, 1–16 (2017)

    Google Scholar 

  2. Vijayakrishnan, R., Steinhubl, S.R., Ng, K., et al.: Prevalence of heart failure signs and symptoms in a large primary care population identified through the use of text and data mining of the electronic health record. J. Cardiac Fail. 20(7), 459–464 (2014)

    Article  Google Scholar 

  3. Mukhopadhyay, A., Maulik, U., Bandyopadhyay, S., et al.: A survey of multiobjective evolutionary algorithms for data mining: part I. IEEE Trans. Evol. Comput. 18(1), 4–19 (2014)

    Article  Google Scholar 

  4. Angiulli, F., Basta, S., Lodi, S., et al.: Distributed strategies for mining outliers in large data sets. IEEE Trans. Knowl. Data Eng. 25(7), 1520–1532 (2013)

    Article  Google Scholar 

  5. Park, J.H., Yang, L.T., Chen, J.: Research trends in cloud, cluster and grid computing. Clust. Comput. 16(3), 335–337 (2013)

    Article  Google Scholar 

  6. Touw, W.G., Bayjanov, J.R., Overmars, L., et al.: Data mining in the life sciences with random forest: a walk in the park or lost in the jungle? Brief. Bioinform. 14(3), 315–326 (2013)

    Article  Google Scholar 

  7. Holzinger, A., Dehmer, M., Jurisica, I.: Knowledge discovery and interactive data mining in bioinformatics-state-of-the-art, future challenges and research directions. BMC Bioinform. 15(6), I1 (2014)

    Article  Google Scholar 

  8. Ahmed, A.B.E.D., Elaraby, I.S.: Data mining: a prediction for student’s performance using classification method. World J. Comput. Appl. Technol. 2(2), 43–47 (2014)

    Google Scholar 

  9. Natek, S., Zwilling, M.: Student data mining solution—knowledge management system related to higher education institutions. Expert Syst. Appl. 41(14), 6400–6407 (2014)

    Article  Google Scholar 

  10. Gadet, F., Varro, G.: Guest editorial on cluster computing in the Internet. Clust. Comput. 7(1), 5 (2004)

    Article  Google Scholar 

  11. Yukselturk, E., Ozekes, S., Türel, Y.K.: Predicting dropout student: an application of data mining methods in an online education program. Eur. J. Open Distance E-Learn. 17(1), 118–133 (2014)

    Article  Google Scholar 

  12. Moghadam, A.N., Ravanmehr, R.: Multi-agent distributed data mining approach for classifying meteorology data: case study on Iran’s synoptic weather stations. Int. J. Environ. Sci. Technol. 15(11), 1–10 (2017)

    Google Scholar 

  13. Sengottaian, S., Natesan, S., Mathivanan, S.: Weighted delta factor cluster ensemble algorithm for categorical data clustering in data mining. Int. Arab J. Inf. Technol. (IAJIT) 14(3), 275–284 (2017)

    Google Scholar 

  14. Ronowicz, J., Thommes, M., Kleinebudde, P., et al.: A data mining approach to optimize pellets manufacturing process based on a decision tree algorithm. Eur. J. Pharm. Sci. 73, 44–48 (2015)

    Article  Google Scholar 

  15. Kusic, D., Kephart, J.O., Hanson, J.E., et al.: Power and performance management of virtualized computing environments via lookahead control. Clust. Comput. 12(1), 1–15 (2009)

    Article  Google Scholar 

  16. Khargharia, B., Hariri, S., Yousif, M.S.: Autonomic power and performance management for computing systems. Clust. Comput. 11(2), 167–181 (2008)

    Article  Google Scholar 

  17. Liu, M.Y., Tuzel, O., Ramalingam, S., et al.: Entropy-rate clustering: cluster analysis via maximizing a submodular function subject to a matroid constraint. IEEE Trans. Pattern Anal. Mach. Intell. 36(1), 99–112 (2014)

    Article  Google Scholar 

  18. Castaldi, P.J., Dy, J., Ross, J., et al.: Cluster analysis in the COPDGene study identifies subtypes of smokers with distinct patterns of airway disease and emphysema. Thorax 69(5), 415–422 (2014)

    Article  Google Scholar 

  19. Hung, C.C., Peng, W.C., Lee, W.C.: Clustering and aggregating clues of trajectories for mining trajectory patterns and routes. VLDB J. Int. J. Very Large DataBases 24(2), 169–192 (2015)

    Article  Google Scholar 

  20. Zimek, A., Vreeken, J.: The blind men and the elephant: on meeting the problem of multiple truths in data from clustering and pattern mining perspectives. Mach. Learn. 98(1–2), 121–155 (2015)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported by Science and Technology Program of Colleges and Universities of Shandong province (J15LN11), China; Key projects of Education Department of Shandong Province (C2016M058), China; Key projects of Education Department of Sichuan Province (16ZA0090), China; National Social Science Fund Project (17BGL058), China; Humanity and Social Science Research Foundation of Ministry of Education (15YJA790051), China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lina Hao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, C., Hao, L. & Fan, L. Optimization and improvement of data mining algorithm based on efficient incremental kernel fuzzy clustering for large data. Cluster Comput 22 (Suppl 2), 3001–3010 (2019). https://doi.org/10.1007/s10586-018-1767-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-018-1767-1

Keywords