Abstract
In recent days, the term big data become popular and refers to data heterogeneity and massive quantity which gets updated and multiplied in every fraction of second. This paper discusses the application of big data and its impact on the medical domain. It is noted that the usage of big data models and methods is seamlessly using in the management of exponential data growth in the healthcare domain. Presently, it is complex to visualize how machine learning and big data will have an impact on the medical field. At the same time, there is a significant increment in the number of persons suffers from diabetes mellitus (DM) in various healing centers. This study develops a new map reduce based optimal data classifier (MRODC) technique to diagnose DM efficiently. The presented MRODC model involves different stages of the Hadoop ecosystem, data acquisition, and classification based on gradient boosting tree (GBT). To further improvising the classifier results of the GBT, an improved k-means clustering approach is integrated into it. The traditional K-means clustering involves a random generation of seed value, which greatly affects the cluster’s outcome. In improved K-means clustering, a new mechanism is introduced, which sets the seed value based on the minimal clustering error (CE). A detailed experimentation takes place on the benchmark PIMA Indians Diabetes dataset under several aspects. The obtained simulation outcome depicted that the presented MRODC model produces consistently better results over the compared methods with a supreme precision of 99.23, recall of 97.48, accuracy of 97.79, F-score of 98.34, and κ value of 95.02 .
Similar content being viewed by others
References
Ahmad A, Mustapha A, Zahadi ED, Masah N, Yahaya NY (2011) Comparison between neural networks against decision tree in improving prediction accuracy for diabetes mellitus. In: International conference on digital information processing and communications, pp 537–545
Andreu-Perez J, Poon CC, Merrifield RD, Wong ST, Yang GZ (2015) Big data for health. IEEE J Biomed Health Inf 19(4):1193–1208
Archenaa J, Anita EM (2015) A survey of big data analytics in healthcare and government. Procedia Comput Sci 50:408–413
Atlas D (2015) International diabetes federation. IDF Diabetes Atlas, 7th edn. International Diabetes Federation, Brussels
Bellazzi R, Zupan B (2008) Predictive data mining in clinical medicine: current issues and guidelines. Int J Med Inf 77(2):81–97
Chandrakar O, Saini JR (2016) Development of Indian weighted diabetic risk score (IWDRS) using machine learning techniques for type-2 diabetes. In Proceedings of the 9th Annual ACM India, pp 125–128
Darwish A, Hassanien AE, Elhoseny M, Sangaiah AK, Muhammad K (2019) The impact of the hybrid platform of internet of things and cloud computing on healthcare systems: opportunities, challenges, and open problems. J Ambient Intell Hum Comput 10(10):4151–4166
Devarajan M, Subramaniyaswamy V, Vijayakumar V, Ravi L (2019) Fog-assisted personalized healthcare-support system for remote patients with diabetes. J Ambient Intell Humaniz Comput 10(10):3747–3760
Eswari T, Sampath P, Lavanya S (2015) Predictive methodology for diabetic data analysis in big data. Proc Comput Sci 50:203–208
Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting. Ann Stat 28(2):337–407
Gittens M, King R, Gittens C, Als A (2014) Post-diagnosis management of diabetes through a mobile health consultation application. In: 2014 IEEE 16th International Conference on e-Health Networking, Applications and Services (Healthcom), pp 152–157
Han L, Luo S, Wang H, Pan L, Ma X, Zhang T (2016) An intelligible risk stratification model based on pairwise and size constrained K means. IEEE J Biomed Health Inf 21(5):1288–1296
Harimoorthy K, Thangavelu M (2020) Multi-disease prediction model using improved SVM-radial bias technique in healthcare monitoring system. J Ambient Intell Humaniz Comput 2:1–9
Harper PR, Jones SK (2005) Mathematical models for the early detection and treatment of colorectal cancer. Health Care Manag Sci 8(2):101–109
Li SS, Zang EK, Li M (2015) Research on the effectiveness of application of diabetes management APP. China Medical Devices 30:144–146
Marcano-Cedeño A, Torres J, Andina D (2011) A prediction model to diabetes using artificial metaplasticity. In: International work-conference on the interplay between natural and artificial computation, Springer, Berlin, pp 418–425
Mohammedi M, Omar M, Bouabdallah A (2018) Secure and lightweight remote patient authentication scheme with biometric inputs for mobile healthcare environments. J Ambient Intell Humaniz Comput 9(5):1527–1539
Patil BM, Joshi RC, Toshniwal D (2010) Hybrid prediction model for type-2 diabetic patients. Expert Syst Appl 37(12):8102–8108
Ramsingh J, Bhuvaneswari V (2015) An insight on big data analytics using pig script. IJETTCS 4(6):2278–6856
Ramsingh J, Bhuvaneswari V (2018) An efficient Map Reduce-Based Hybrid NBC-TFIDF algorithm to mine the public sentiment on diabetes mellitus–A big data approach. J King Saud Univ Comput Inf Sci. https://doi.org/10.1016/j.jksuci.2018.06.011
Songthung P, Sripanidkulchai K (2016) Improving type 2 diabetes mellitus risk prediction using classification. In 13th International Joint Conference on Computer Science and Software Engineering (JCSSE), pp 1–6
Sowjanya K, Singhal A, Choudhary C (2015) MobDBTest: A machine learning based system for predicting diabetes risk using mobile devices, In 2015 IEEE International Advance Computing Conference (IACC), pp 397–402
Sujitha R, Seenivasagam V (2020) Classification of lung cancer stages with machine learning over big data healthcare framework. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-020-02071-2
Sun Y, Fang L, Wang P (2016) Improved k-means clustering based on Efros distance for longitudinal data, In 2016 Chinese Control and Decision Conference (CCDC), pp 3853–3856
TMichie D, Spiegelhalter DJ, aylor CC (1994) Machine learning. Neural Statistical Classification 13:1–298
Vijayan VV, Anjali C (2015) Decision support systems for predicting diabetes mellitus—A review. In: 2015 Global conference on communication technologies (GCCT), pp 98–103
Wang J, Su X (2011) An improved K-means clustering algorithm. In: 2011 IEEE 3rd international conference on communication software and networks, pp 44–46
Wang S (2013) Improved K-means clustering algorithm based on the optimized initial centroids. In: Proceedings of 2013 3rd international conference on computer science and network technology, pp 450–453
Wu H, Yang S, Huang Z, He J, Wang X (2018) Type 2 diabetes mellitus prediction model based on data mining. Inf Med Unlock 10:100–107
Funding
Not applicable.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All the authors have expressed no conflict of interest.
Availability of data and material (data transparency)
Not available.
Code availability (software application or custom code)
Not available.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Selvi, R.T., Muthulakshmi, I. Modelling the map reduce based optimal gradient boosted tree classification algorithm for diabetes mellitus diagnosis system. J Ambient Intell Human Comput 12, 1717–1730 (2021). https://doi.org/10.1007/s12652-020-02242-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-020-02242-1