Abstract
Diabetes is a metabolic disorder which can be greatly affected by lifestyle. The disease cannot be cured but can be controlled, which will minimize the complications such as heart disease, stroke and blindness. Clinicians routinely collect large amounts of information on diabetic patients as part of their day to day management for control of the disease. We investigate the potential for data mining in order to spot trends in the data and attempt to predict outcome. Feature selection has been used to improve the efficiency of the data mining algorithms and identify the contribution of different features to diabetes control status prediction. Decision trees can provide classification accuracy over 78%. However, while most bad control cases (90%) can be correctly classified, at least 50% of good control cases will be misclassified, which means that current feature selection and prediction models illustrate some potential but need additional refinement.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
IDF.International Diabetes Federation, Diabetes Atlas, 2nd edn. (2003)
Diabetes UK, Diabetes in Northern Ireland (March 2004), http://www.diabetes.org.uk/n.ireland/nireland.htm
Diabetes UK, Understanding Diabetes-Your key to better health (2003), http://www.diabetes.org.uk/infocentre/pubs/Understand.doc
Lehmann, E.D., Deutsch, T.: Application of Computers in Diabetes Care-A Review I: Computers for Data Collection and Interpretation. MED INFORM 20(4), 281–302 (1995)
American Association of Clinical Endocrinologists and the American College of Endocrinology. Medical Guidelines for the Management of Diabetes Mellitus: The AACE System of Intensive Diabetes Self-Management-2002, Update. Endocrine Practice. Vol.8 (Suppl.1), 40-82 (2002)
Diabetes Control and Complications Trial Research Group: The effect of intensive treat ment of diabetes on the development and progression of long-term complications in insulin- dependent diabetes mellitus. N Engl. J Med.329, 977-986 (1993)
UK Prospective Diabetes Study (UKPDS) Group. Intensive blood-glucose control with sulphonylureas or insulin compared with conventional treatment and risk of complications in patients with type 2 diabetes (UKPDS 33). Lancet. 352, 837-853 (1998)
UK Prospective Diabetes Study (UKPDS) Group. Effect of intensive blood-glucose control with metformin on complications in overweight patients with type 2 diabetes (UKPDS34) Lancet. 352, 854-865 (1998)
Rahman, Y., Nolan, J., Grimson, J.: E-Clinic: Re-engineering Clinical Care Process in Diabetes Management. In: HISI (2002)
American Diabetes Association, About us.American Diabetes Association (2004), http://www.diabetes.org/aboutus.jsp?WTLPromo=HEADER_aboutus&vms=142585600057
Strattpm, I.M., Adler, A.I., Neil, H.A.W.: Association of Glycaemia with Macrovascular and Microvascular complications of Type 2 Diabetes. Br. Med. J 321, 405–412 (2000)
Lavrac, N.: Selected Techniques for Data Mining in Medicine, AI Med. AI Med. 16(1), 3–23 (2002)
Huang, Y., McCullagh, P.J., Black, N.D., Harper, R.: Feature Selection and Classification Model Construction on Type 2 Diabetic Patient’s Data. In: Proceeding of 4th Industrial Conference on Data Mining, Springer, Heidelberg (2004)
Hegland, M.: Computational Challgnges in Data Mining. ANZIAM J 42(E), C1-C43 (2000)
Duhamel, A., Nuttens, M.C., Devos, P., Picavet, M., Beuscart, R.: A preprocessing method for improving data mining techniques: Application to a large medical diabetes database. Stud Health Technol Inform, 268–274 (2003)
Stilou, S., Bamidis, P.D., Maglaveras, N., Pappas, C.: Mining Association Rules from Clinical Databases: An Intelligent Diagnostic Process in Healthcare. MEDINFO, 1399–1403 (2001)
Kononenko, I.: Estimating attributes: Analysis and extensions of Relief. In: Proceeding of the Seventh European Conference on Machine Learning, pp. 171–182. Springer, Heidelberg (1994)
Demsar, J., Zupan, B., Aoki, N., Wall, M.J., Granchi, T.H., Beck, J.R.: Feature Mining and Predictive Model Construction from Severe Trauma Patient’s Data. Int. J Med. Inf. 63, 41–50 (2001)
Molina, L., Belanche, L., Nebot, A.: Feature Selection Algorithms: A Survey and Experimental Evaluation. In: Proceeding of IEEE International Conference on Data Mining, pp. 306–313. IEEE, Los Alamitos (2002)
Perner, P.: Improving the Accuracy of Decision Tree Induction by Feature Pre-Selection. Applied Artificial Intelligence 15(8), 747–760 (2001)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (1999)
Chen, M.S., Han, J., Yu, P.S.: Data Mining: An Overview from Database Perspective. IEEE Transaction on Knowledge and Data Engineering 8(6), 866–883 (1996)
Turney, P.: Theoretical Analysis of Cross-Validation Error and Voting in Instance-Based Learning. J Experimental and Theoretical Artificial Intelligence 6, 361–391 (1994)
Perner, P., Trautzsch, S.: Multi-interval Diacretization for Decision Tree Learning. In: Amin, A., Pudil, P., Dori, D. (eds.) SPR 1998 and SSPR 1998. LNCS, vol. 1451, pp. 475–482. Springer, Heidelberg (1998)
Dougherty, J., Kohavi, R., Sahamin, M.: Supervised and Unsupervised Discretization of Continuous Features. Machine Learning, 14th IJCAI, 194–202 (1995)
Fayyad, U.M., Irani, K.B.: Multi-interval Discretization of Continuous-Valued Attributes for Classification Learning. In: Machine Learning, 13th IJCAI, pp. 1022–1027 (1993)
Veropoulos, K., Campbell, C., Cristianini, N.: Conrolling the Sensitivity of Support Vector Machines. In: IJCAI 1999 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Huang, Y., McCullagh, P., Black, N., Harper, R. (2004). Evaluation of Outcome Prediction for a Clinical Diabetes Database. In: López, J.A., Benfenati, E., Dubitzky, W. (eds) Knowledge Exploration in Life Science Informatics. KELSI 2004. Lecture Notes in Computer Science(), vol 3303. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30478-4_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-30478-4_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23927-7
Online ISBN: 978-3-540-30478-4
eBook Packages: Springer Book Archive