Using Data Mining Techniques in Monitoring Diabetes Care. The Simpler the Better?

Gregori, Dario; Petrinco, Michele; Bo, Simona; Rosato, Rosalba; Pagano, Eva; Berchialla, Paola; Merletti, Franco

doi:10.1007/s10916-009-9363-9

Using Data Mining Techniques in Monitoring Diabetes Care. The Simpler the Better?

Original Paper
Published: 10 September 2009

Volume 35, pages 277–281, (2011)
Cite this article

Journal of Medical Systems Aims and scope Submit manuscript

Dario Gregori¹,
Michele Petrinco²,
Simona Bo³,
Rosalba Rosato²,
Eva Pagano²,
Paola Berchialla⁴ &
…
Franco Merletti²

489 Accesses
10 Citations
Explore all metrics

Abstract

We aim at evaluating how data-mining statistical techniques can be applied on medical records and administrative data of diabetes and how they differ in terms of capabilities of predicting outcomes (e.g. death). Data on 3,892 outpatient patients with a diagnosis of type 2 diabetes from the San Giovanni Battista Hospital in Torino. Six statistical classifiers were applied: Logistic regression (LR), Generalized Additive Model (GAM), Projection pursuit Regression (PPR), Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), Artificial Neural Networks (ANN). All models selected the same subset of covariates. ANN is the model performing worse, whereas simpler models, like LR, GAM and LDA seem to perform better. GAM is associated with a very small misclassification rate. The agreement in predicting individual outcomes among models is 0.23 (SE 0.06, Kappa). Monitoring on the basis of patients’ characteristics is highly dependent from the statistical properties of the chosen statistical model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Article 09 November 2022

Vitor Werner de Vargas, Jorge Arthur Schneider Aranda, … Jorge Luis Victória Barbosa

Heart Disease Prediction using Machine Learning Techniques

Article 16 October 2020

Devansh Shah, Samir Patel & Santosh Kumar Bharti

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

References

Podgorelec, V., Kokol, P., Stiglic, M. M., Hericko, M., and Rozman, I., Knowledge discovery with classification rules in a cardiovascular dataset. Comput. Methods Programs Biomed. 80(Suppl 1):S39–S49, 2005.
Article Google Scholar
Zhang, Q. P., Sun, D. Y., Lu, M., Qin, P., and Shang, T., The application of biomed-informatics in cardiovascular research—Data and knowledge. Sheng Li Ke Xue Jin Zhan. 36(2):119–124, 2005.
Google Scholar
Bo, S., Ciccone, G., Grassi, G., et al., Patients with type 2 diabetes had higher rates of hospitalization than the general population. J. Clin. Epidemiol. 57(11):1196–1201, 2004.
Article Google Scholar
R Development Core Team. R: A language and environment for statistical computing 2005.
Fisher, R. A., The use of multiple measurements in taxonomic problems. Annals of Eugenics. 8:376–386, 1936.
Google Scholar
Tatsuoka, M. M., Discriminant analysis. Institute for Personality and Ability Testing, Champaign, 1970.
Google Scholar
Nelder, J. A., and Wedderburn, R. W. M., Generalized linear models. J. R. Stat. Soc., Ser. A. 135:370–384, 1972.
Article Google Scholar
Hastie, T. J., and Tibshirani, R. J., Generalized additive models. Chapman and Hall, New York, 1990.
MATH Google Scholar
Friedman, J. H., and Stuetzle, W., Projection pursuit regression. J. Am. Stat. Assoc. 76:817–823, 1981.
Article MathSciNet Google Scholar
Ripley, B. D., Pattern recognition and neural networks. Cambridge University Press, Cambridge, 1996.
MATH Google Scholar
Efron, B., Estimating the error rate of a prediction rule: Some improvements on crossvalidation. J. Am. Stat. Assoc. 78:316–331, 1983.
Article MathSciNet MATH Google Scholar
Siegel, S. and Castellan, J. N. Nonparametric statistics for the behavioral sciences. 2nd ed. McGraw-Hill, 1988.
Bartfay, E., Mackillop, W. J., and Prater, J. L., Comparing the predictive value of neural network models to logistic regression models on the risk of death for small-cell lung cancer patients. Eur. J. Cancer Care. 15(2):115–124, 2006.
Article Google Scholar
Braitman, L. E., and Davidoff, F., Predicting clinical states in individual patients. Ann. Intern. Med. 125(5):406–412, 1996.
Google Scholar
Reilly, B. M., and Evans, A. T., Translating clinical research into clinical practice: Impact of using prediction rules to make decisions. Ann. Intern. Med. 144(3):201–209, 2006.
Google Scholar
Scott, L. J., Warram, J. H., Hanna, L. S., Laffel, L. M., Ryan, L., and Krolewski, A. S., A nonlinear effect of hyperglycemia and current cigarette smoking are major determinants of the onset of microalbuminuria in type 1 diabetes. Diabetes. 50(12):2482–2489, 2001.
Article Google Scholar
Andersen, A. H., Gash, D. M., and Avison, M. J., Principal component analysis of the dynamic response measured by fMRI: A generalized linear systems framework. Magn. Reson. Imaging. 17(6):795–815, 1999.
Article Google Scholar
Du, Y., and Liang, Y., Data mining for seeking accurate quantitative relationship between molecular structure and GC retention indices of alkanes by projection pursuit. Comput. Biol. Chem. 27(3):339–353, 2003.
Article Google Scholar
Du, Y., Liang, Y., and Yun, D., Data mining for seeking an accurate quantitative relationship between molecular structure and GC retention indices of alkenes by projection pursuit. J. Chem. Inf. Comput. Sci. 42(6):1283–1292, 2002.
Google Scholar
Gribonval, R., From projection pursuit and CART to adaptive discriminant analysis? IEEE Trans. Neural Netw. 16(3):522–532, 2005.
Article Google Scholar
Ren, S., and Kim, H., Comparative assessment of multiresponse regression methods for predicting the mechanisms of toxic action of phenols. J. Chem. Inf. Comput. Sci. 43(6):2106–2110, 2003.
Google Scholar
Vlassis, N., Motomura, Y., and Krose, B., Supervised dimension reduction of intrinsically low-dimensional data. Neural Comput. 14(1):191–215, 2002.
Article MATH Google Scholar
Ennis, M., Hinton, G., Naylor, D., Revow, M., and Tibshirani, R., A comparison of statistical learning methods on the GUSTO database. Stat. Med. 17:2501–2508, 1998.
Article Google Scholar
Almeida, J. S., Predictive non-linear modeling of complex data by artificial neural networks. Curr. Opin. Biotechnol. 13(1):72–76, 2002.
Article Google Scholar
Tafeit, E., and Reibnegger, G., Artificial neural networks in laboratory medicine and medical outcome prediction. Clin. Chem. Lab. Med. 37(9):845–853, 1999.
Article Google Scholar
Tu, J. V., Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. J. Clin. Epidemiol. 49:1225–1231, 1996.
Article Google Scholar
Schwarzer, G., Vach, W., and Schumacher, M., On the misuses of artificial neural networks for prognostic and diagnostic classification in oncology. Stat. Med. 19(4):541–561, 2000.
Article Google Scholar
Ripley, B. D. Statistical aspects of neural networks. In: Barndorff-Nielsen, O. E., JJLe, ed. Networks and chaos—statistical and probabilistic aspects. London: Chapman and Hall, 1993.
Vach, W., Rossner, R., and Schumacher, M., Neural networks and logistic regression: Part II. Comput. Stat. Data Anal. 21:683–701, 1996.
Article MATH Google Scholar
Dybowski, R., Weller, P., Chang, R., and Gant, V., Prediction of outcome in critically ill patients using artificial neural network synthesised by genetic algorithm. Lancet. 347(9009):1146–1150, 1996.
Article Google Scholar
Justice, A. C., Covinsky, K. E., and Berlin, J. A., Assessing the generalizability of prognostic information. Ann. Intern. Med. 130(6):515–524, 1999.
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratories of Epidemiological Methods and Biostatistics, Department of Environmental Medicine and Public Health, University of Padova, Via Loredan 18, 35121, Padova, Italy
Dario Gregori
Unit of Cancer Epidemiology, University of Torino, and CPO Piemonte, Turin, Italy
Michele Petrinco, Rosalba Rosato, Eva Pagano & Franco Merletti
Department of Internal Medicine, University of Torino, Turin, Italy
Simona Bo
Department of Public Health and Microbiology, University of Torino, Turin, Italy
Paola Berchialla

Authors

Dario Gregori
View author publications
You can also search for this author in PubMed Google Scholar
Michele Petrinco
View author publications
You can also search for this author in PubMed Google Scholar
Simona Bo
View author publications
You can also search for this author in PubMed Google Scholar
Rosalba Rosato
View author publications
You can also search for this author in PubMed Google Scholar
Eva Pagano
View author publications
You can also search for this author in PubMed Google Scholar
Paola Berchialla
View author publications
You can also search for this author in PubMed Google Scholar
Franco Merletti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dario Gregori.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gregori, D., Petrinco, M., Bo, S. et al. Using Data Mining Techniques in Monitoring Diabetes Care. The Simpler the Better?. J Med Syst 35, 277–281 (2011). https://doi.org/10.1007/s10916-009-9363-9

Download citation

Received: 06 May 2009
Accepted: 10 August 2009
Published: 10 September 2009
Issue Date: April 2011
DOI: https://doi.org/10.1007/s10916-009-9363-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using Data Mining Techniques in Monitoring Diabetes Care. The Simpler the Better?

Abstract

Access this article

Similar content being viewed by others

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Heart Disease Prediction using Machine Learning Techniques

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Using Data Mining Techniques in Monitoring Diabetes Care. The Simpler the Better?

Abstract

Access this article

Similar content being viewed by others

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Heart Disease Prediction using Machine Learning Techniques

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation