Skip to main content
Log in

Modelling the map reduce based optimal gradient boosted tree classification algorithm for diabetes mellitus diagnosis system

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

In recent days, the term big data become popular and refers to data heterogeneity and massive quantity which gets updated and multiplied in every fraction of second. This paper discusses the application of big data and its impact on the medical domain. It is noted that the usage of big data models and methods is seamlessly using in the management of exponential data growth in the healthcare domain. Presently, it is complex to visualize how machine learning and big data will have an impact on the medical field. At the same time, there is a significant increment in the number of persons suffers from diabetes mellitus (DM) in various healing centers. This study develops a new map reduce based optimal data classifier (MRODC) technique to diagnose DM efficiently. The presented MRODC model involves different stages of the Hadoop ecosystem, data acquisition, and classification based on gradient boosting tree (GBT). To further improvising the classifier results of the GBT, an improved k-means clustering approach is integrated into it. The traditional K-means clustering involves a random generation of seed value, which greatly affects the cluster’s outcome. In improved K-means clustering, a new mechanism is introduced, which sets the seed value based on the minimal clustering error (CE). A detailed experimentation takes place on the benchmark PIMA Indians Diabetes dataset under several aspects. The obtained simulation outcome depicted that the presented MRODC model produces consistently better results over the compared methods with a supreme precision of 99.23, recall of 97.48, accuracy of 97.79, F-score of 98.34, and κ value of 95.02 .

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Ahmad A, Mustapha A, Zahadi ED, Masah N, Yahaya NY (2011) Comparison between neural networks against decision tree in improving prediction accuracy for diabetes mellitus. In: International conference on digital information processing and communications, pp 537–545

  • Andreu-Perez J, Poon CC, Merrifield RD, Wong ST, Yang GZ (2015) Big data for health. IEEE J Biomed Health Inf 19(4):1193–1208

    Article  Google Scholar 

  • Archenaa J, Anita EM (2015) A survey of big data analytics in healthcare and government. Procedia Comput Sci 50:408–413

    Article  Google Scholar 

  • Atlas D (2015) International diabetes federation. IDF Diabetes Atlas, 7th edn. International Diabetes Federation, Brussels

    Google Scholar 

  • Bellazzi R, Zupan B (2008) Predictive data mining in clinical medicine: current issues and guidelines. Int J Med Inf 77(2):81–97

    Article  Google Scholar 

  • Chandrakar O, Saini JR (2016) Development of Indian weighted diabetic risk score (IWDRS) using machine learning techniques for type-2 diabetes. In Proceedings of the 9th Annual ACM India, pp 125–128

  • Darwish A, Hassanien AE, Elhoseny M, Sangaiah AK, Muhammad K (2019) The impact of the hybrid platform of internet of things and cloud computing on healthcare systems: opportunities, challenges, and open problems. J Ambient Intell Hum Comput 10(10):4151–4166

    Article  Google Scholar 

  • Devarajan M, Subramaniyaswamy V, Vijayakumar V, Ravi L (2019) Fog-assisted personalized healthcare-support system for remote patients with diabetes. J Ambient Intell Humaniz Comput 10(10):3747–3760

    Article  Google Scholar 

  • Eswari T, Sampath P, Lavanya S (2015) Predictive methodology for diabetic data analysis in big data. Proc Comput Sci 50:203–208

    Article  Google Scholar 

  • Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting. Ann Stat 28(2):337–407

    Article  MathSciNet  Google Scholar 

  • Gittens M, King R, Gittens C, Als A (2014) Post-diagnosis management of diabetes through a mobile health consultation application. In: 2014 IEEE 16th International Conference on e-Health Networking, Applications and Services (Healthcom), pp 152–157

  • Han L, Luo S, Wang H, Pan L, Ma X, Zhang T (2016) An intelligible risk stratification model based on pairwise and size constrained K means. IEEE J Biomed Health Inf 21(5):1288–1296

    Article  Google Scholar 

  • Harimoorthy K, Thangavelu M (2020) Multi-disease prediction model using improved SVM-radial bias technique in healthcare monitoring system. J Ambient Intell Humaniz Comput 2:1–9

    Google Scholar 

  • Harper PR, Jones SK (2005) Mathematical models for the early detection and treatment of colorectal cancer. Health Care Manag Sci 8(2):101–109

    Article  Google Scholar 

  • Li SS, Zang EK, Li M (2015) Research on the effectiveness of application of diabetes management APP. China Medical Devices 30:144–146

    Google Scholar 

  • Marcano-Cedeño A, Torres J, Andina D (2011) A prediction model to diabetes using artificial metaplasticity. In: International work-conference on the interplay between natural and artificial computation, Springer, Berlin, pp 418–425

  • Mohammedi M, Omar M, Bouabdallah A (2018) Secure and lightweight remote patient authentication scheme with biometric inputs for mobile healthcare environments. J Ambient Intell Humaniz Comput 9(5):1527–1539

    Article  Google Scholar 

  • Patil BM, Joshi RC, Toshniwal D (2010) Hybrid prediction model for type-2 diabetic patients. Expert Syst Appl 37(12):8102–8108

    Article  Google Scholar 

  • Ramsingh J, Bhuvaneswari V (2015) An insight on big data analytics using pig script. IJETTCS 4(6):2278–6856

    Google Scholar 

  • Ramsingh J, Bhuvaneswari V (2018) An efficient Map Reduce-Based Hybrid NBC-TFIDF algorithm to mine the public sentiment on diabetes mellitus–A big data approach. J King Saud Univ Comput Inf Sci. https://doi.org/10.1016/j.jksuci.2018.06.011

  • Songthung P, Sripanidkulchai K (2016) Improving type 2 diabetes mellitus risk prediction using classification. In 13th International Joint Conference on Computer Science and Software Engineering (JCSSE), pp 1–6

  • Sowjanya K, Singhal A, Choudhary C (2015) MobDBTest: A machine learning based system for predicting diabetes risk using mobile devices, In 2015 IEEE International Advance Computing Conference (IACC), pp 397–402

  • Sujitha R, Seenivasagam V (2020) Classification of lung cancer stages with machine learning over big data healthcare framework. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-020-02071-2

    Article  Google Scholar 

  • Sun Y, Fang L, Wang P (2016) Improved k-means clustering based on Efros distance for longitudinal data, In 2016 Chinese Control and Decision Conference (CCDC), pp 3853–3856

  • TMichie D, Spiegelhalter DJ, aylor CC (1994) Machine learning. Neural Statistical Classification 13:1–298

    Google Scholar 

  • Vijayan VV, Anjali C (2015) Decision support systems for predicting diabetes mellitus—A review. In: 2015 Global conference on communication technologies (GCCT), pp 98–103

  • Wang J, Su X (2011) An improved K-means clustering algorithm. In: 2011 IEEE 3rd international conference on communication software and networks, pp 44–46

  • Wang S (2013) Improved K-means clustering algorithm based on the optimized initial centroids. In: Proceedings of 2013 3rd international conference on computer science and network technology, pp 450–453

  • Wu H, Yang S, Huang Z, He J, Wang X (2018) Type 2 diabetes mellitus prediction model based on data mining. Inf Med Unlock 10:100–107

    Article  Google Scholar 

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R. Thanga Selvi.

Ethics declarations

Conflict of interest

All the authors have expressed no conflict of interest.

Availability of data and material (data transparency)

Not available.

Code availability (software application or custom code)

Not available.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Selvi, R.T., Muthulakshmi, I. Modelling the map reduce based optimal gradient boosted tree classification algorithm for diabetes mellitus diagnosis system. J Ambient Intell Human Comput 12, 1717–1730 (2021). https://doi.org/10.1007/s12652-020-02242-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-020-02242-1

Keywords

Navigation