Skip to main content

Data Preprocessing for Decision Making in Medical Informatics: Potential and Analysis

  • Conference paper
  • First Online:
Trends and Advances in Information Systems and Technologies (WorldCIST'18 2018)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 746))

Included in the following conference series:

Abstract

Clinical databases often comprise noisy, inconsistent, missing, imbalanced and high dimensional data. These challenges may reduce the performance of DM techniques. Data preprocessing is, therefore, essential step in order to use DM algorithms on these medical datasets as regards making it appropriate and suitable for mining. The objective is to carry out a systematic mapping study in order to review the use of preprocessing techniques in clinical datasets. As results, 110 papers published between January 2000 and March 2017 were, selected, analyzed and classified according to publication years and channels, research type and the preprocessing tasks used. This study shows that researchers have paid a considerable amount of attention to preprocessing in medical DM in last decade and a significant number of the selected studies used data reduction and cleaning preprocessing tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kitchenham, B., Budgen, D., Brereton, O.P.: The value of mapping studies – participant-observer case study. In: Proceedings of the 14th international conference on Evaluation and Assessment in Software Engineering EASE 2010, pp. 25–33 (2010)

    Google Scholar 

  2. Petersen, K., Feldt, R., Mujtaba, S., Mattsson, M.: Systematic mapping studies in software engineering. In: Proceedings of the 12th international conference on Evaluation and Assessment in Software Engineering EASE 2008, pp. 68–77 (2008)

    Google Scholar 

  3. Bowyer, K.W.: Mentoring Advice on “Conferences Versus Journals” for CSE Faculty (2012)

    Google Scholar 

  4. Akay, M.F.: Support vector machines combined with feature selection for breast cancer diagnosis. Expert Syst. Appl. 36, 3240–3247 (2009)

    Article  Google Scholar 

  5. Khemphila, A., Boonjing, V.: Heart disease classification using neural network and feature selection. In: 21st International Conference on Systems Engineering, pp. 406–409 (2011). https://doi.org/10.1109/icseng.2011.80

  6. Poolsawad, N., Moore, L., Kambhampati, C., Cleland, J.G.F.: Issues in the mining of heart failure datasets. Int. J. Autom. Comput. 11, 162–179 (2014)

    Article  Google Scholar 

  7. Almuhaideb, S., Menai, M.E.B.: Impact of preprocessing on medical data classification. Front. Comput. Sci. 10, 1082–1102 (2016)

    Article  Google Scholar 

  8. Exarchos, T.P., Papaloukas, C., Fotiadis, D.I., Michalis, L.K.: An association rule mining-based methodology for automated detection of ischemic ECG beats. IEEE Trans. Biomed. Eng. 53, 1531–1540 (2006)

    Article  Google Scholar 

  9. Demšar, J., et al.: Feature mining and predictive model construction from severe trauma patient’s data. Int. J. Med. Inform. 63, 41–50 (2001)

    Article  Google Scholar 

  10. Duggal, R., Shukla, S., Chandra, S., Shukla, B., Khatri, S.K.: Impact of selected pre-processing techniques on prediction of risk of early readmission for diabetic patients in India. Int. J. Diabetes Dev. Ctries. 36, 469–476 (2016)

    Article  Google Scholar 

  11. Razzaghi, T., Roderick, O., Safro, I., Marko, N.: Multilevel weighted support vector machine for classification on healthcare data with missing values. PLoS One 11 (2016)

    Article  Google Scholar 

  12. Bai, B.M., Mangathayaru, N., Rani, B.P.: An Approach to Find Missing Values in Medical Datasets. In: Proceedings of the International Conference on Engineering & MIS 2015 - ICEMIS 2015, pp. 1–7 (2015). https://doi.org/10.1145/2832987.2833083

  13. Lee, I.-N., Liao, S.-C., Embrechts, M.: Data mining techniques applied to medical information. Med. Inform. Internet Med. 25, 81–102 (2000)

    Article  Google Scholar 

  14. Lungeanu, D., Zaharie, D., Zamfirache, F. Influence of Missing Values Handling on Classification Rules Evolved from Medical Data in Industrial Conference on Data Mining - Posters and Workshops (2008)

    Google Scholar 

  15. Zhang, Y., Kambhampati, C., Davis, D. N., Goode, K., Cleland, J.G.F.: A comparative study of missing value imputation with multiclass classification for clinical heart failure data. In Proceedings of 9th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2012, pp. 2840–2844 (2012)

    Google Scholar 

  16. Bhat, V.H., Rao, P.G., Shenoy, P.D., Venugopal, K.R., Patnaik, L.M.: An efficient prediction model for diabetic database using soft computing techniques. In: 12th International Conference Rough Sets, Fuzzy Sets, Data Mining Granular Computing RSFDGrC 2009, December 15, 2009 - December 18, 2009 5908 LNAI, pp. 328–335 (2009)

    Google Scholar 

  17. Mendes, D., Paredes, S., Rocha, T., Carvalho, P., Henriques, J., Cabiddu, R., Morais, J.: Assessment of cardiovascular risk based on a data -driven knowledge discovery approach. In: Conference of the IEEE Engineering in Medicine and Biology Society (2015)

    Google Scholar 

  18. Jayalskshmi, T., Santhakumaran, A.: Impact of preprocessing for diagnosis of diabetes mellitus using artificial neural networks. In: Second International Conference on Machine Learning and Computing (ICMLC), pp. 109–112 (2010). https://doi.org/10.1109/icmlc.2010.65

  19. Karabulut, E.M., Ibrikci, T.: Effective automated prediction of vertebral column pathologies based on logistic model tree with SMOTE preprocessing. J. Med. Syst. 38, 50 (2014)

    Article  Google Scholar 

  20. Huang, J., Li, Y.-F., Xie, M.: An empirical analysis of data preprocessing for machine learning-based software cost estimation. Inf. Softw. Technol. 67, 108–127 (2015)

    Article  Google Scholar 

  21. Esfandiari, N., Babavalian, M.R., Moghadam, A.M.E., Tabar, V.K.: Knowledge discovery in medicine: Current issue and future trend. Expert Syst. Appl. 41, 4434–4463 (2014)

    Article  Google Scholar 

  22. Jabbar, M.A., Deekshatulu, B. L., Chandra, P.: Computational intelligence technique for early diagnosis of heart disease. In: IEEE International Conference on Engineering and Technology (ICETECH), pp. 1–6 (2015)

    Google Scholar 

  23. Huang, M.W., et al.: Data preprocessing issues for incomplete medical datasets. Expert Syst. 33, 432–438 (2016)

    Article  Google Scholar 

  24. Hejazi, M., Al-Haddad, S.A.R., Singh, Y.P., Hashim, S.J., Aziz, A.F.A.: Multiclass support vector machines for classification of ECG data with missing values. Appl. Artif. Intell. 29, 660–674 (2015)

    Article  Google Scholar 

  25. El-Sappagh, S., Elmogy, M., Riad, A.M., Zaghlol, H., Badria, F.A.: EHR data preparation for case based reasoning construction. In: International Conference on Advanced Machine Learning Technologies and Applications, vol. 488, pp. 483–497(2014)

    Google Scholar 

  26. Duhamel, A., Nuttens, M.C., Devos, P., Picavet, M., Beuscart, R.: A preprocessing method for improving data mining techniques. Application to a large medical diabetes database. Stud. Health Technol. Inf. 95, 269–274 (2003)

    Google Scholar 

  27. Pérez, J., et al.: A data preparation methodology in data mining applied to mortality population databases. Adv. Intell. Syst. Comput. 353, 1173–1182 (2015)

    Google Scholar 

  28. Rahm, E., Do, H.: Data cleaning: Problems and current approaches. IEEE Data Eng. Bull. 23, 3–13 (2000)

    Google Scholar 

  29. Oded, M., Lior, R.: Data Mining and Knowledge Discovery Handbook, 2nd edn. Springer, US (2010)

    MATH  Google Scholar 

  30. Pradhan, M., Bamnote, G.R.: Efficient binary classifier for prediction of diabetes using data preprocessing and support vector machine. In: International Conference on Frontiers of Intelligent Computing: Theory and Applications, vol. 327, pp. 131–140 (2014)

    Google Scholar 

  31. Ragothaman, B., Sarojini, B.: A Multi-objective Non-Dominated Sorted Artificial Bee Colony Feature Selection Algorithm for Medical Datasets. Indian J. Sci. Technol. 9, 1–5 (2016)

    Article  Google Scholar 

  32. Zhu, M., et al.: Dimensionality Reduction in Complex Medical Data: Improved Self-Adaptive Niche Genetic Algorithm. Comput. Math. Methods Med. 2015(2), 1–12 (2015)

    Google Scholar 

  33. Huang, Y., McCullagh, P., Black, N., Harper, R.: Feature selection and classification model construction on type 2 diabetic patients’ data. Artif. Intell. Med. 41, 251–262 (2007)

    Article  Google Scholar 

  34. Longadge, R., Dongre, S.S., Malik, L.: Class imbalance problem in data mining: review. Int. J. Comput. Sci. Netw. 2, 83–87 (2013)

    Google Scholar 

  35. Abolkarlou, N.A., Niknafs, A.A., Ebrahimpour, M.K.: Ensemble imbalance classification: Using data preprocessing, clustering algorithm and genetic algorithm. In: Proceedings of the 4th International Conference on Computer and Knowledge Engineering, ICCKE 2014 (2014). https://doi.org/10.1109/iccke.2014.6993364

  36. Brereton, P., Kitchenham, B.A., Budgen, D., Turner, M., Khalil, M.: Lessons from applying the systematic literature review process within the software engineering domain. J. Syst. Softw. 80, 571–583 (2007)

    Article  Google Scholar 

  37. Kitchenham, B., Charters, S.: Guidelines for performing Systematic Literature reviews in Software Engineering Version 2.3. Engineering 45, 1051 (2007)

    Google Scholar 

  38. Ouhbi, S., Idri, A., Fernández-Alemán, J.L., Toval, A.: Requirements engineering education: a systematic mapping study. Requir. Eng. 20, 119–138 (2013)

    Article  Google Scholar 

  39. Kadi, I., Idri, A., Fernandez-Aleman, J.L.: Knowledge discovery in cardiology: a systematic literature review. Int. J. Med. Inform. 97, 12–32 (2017)

    Article  Google Scholar 

  40. Li, D.-C., Liu, C.-W., Hu, S.C.: A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets. Artif. Intell. Med. 52, 45–52 (2011)

    Article  Google Scholar 

  41. Kitchenham, B., Mendes, E., Travassos, G.: A systematic review of cross-vs. within-company cost estimation studies. In: Proceedings of the Empirical Assessment in Software Engineering, pp. 81–90 (2006)

    Google Scholar 

  42. Gonçalves, J.J., Rocha, Á.M.: A decision support system for quality of life in head and neck oncology patients. Head Neck Oncol. 4(1), 3 (2012)

    Article  Google Scholar 

Download references

Acknowledgements

This research is part of the project PPR1/09: “mPHR in Morocco” financed by the Ministry of High education and Scientific research in Morocco and CNRST, 2015-2017, and part of the GINSENG project (TIN2015-70259-C2-2-R) supported by the Spanish Ministry of Economy and Competitiveness and European FEDER funds.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Idri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Benhar, H., Idri, A., Fernández-Alemán, J.L. (2018). Data Preprocessing for Decision Making in Medical Informatics: Potential and Analysis. In: Rocha, Á., Adeli, H., Reis, L., Costanzo, S. (eds) Trends and Advances in Information Systems and Technologies. WorldCIST'18 2018. Advances in Intelligent Systems and Computing, vol 746. Springer, Cham. https://doi.org/10.1007/978-3-319-77712-2_116

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-77712-2_116

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-77711-5

  • Online ISBN: 978-3-319-77712-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics