Skip to main content

Investigation of the Impact of Missing Value Imputation Methods on the k-NN Classification Accuracy

  • Conference paper
  • First Online:
Computational Collective Intelligence

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9330))

  • 2310 Accesses

Abstract

Paper desribes results of an experiment where various scenarios of missing values occurrence in the data repository has been tested. Experiment was coducted on a publicly available database, containing complete, multidimensional continuous dataspace and multiple classes. Missing values were introduced using “completely at random” scheme. Tested scenarios were: training and testing using incomplete dataset, training on complete data set and testing on incomplete and vice versa. For comparison to data imputation methods also the ensemble of single-feature kNN classifiers, working withoud data imputation, has been tested.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2013). (http://archive.ics.uci.edu/ml)

  2. Saar-Tsechansky, M., Provost, F., Caruana, R.: Handling missing values when applying classification models. Journal of Machine Learning Research 8, 1217–1250 (2007)

    MATH  Google Scholar 

  3. Berthold, M.R., Cebron, N., Dill, F., Gabriel, T.R., Kötter, T., Meinl, T., Ohl, P., Sieb, C., Thiel, K., Wiswedel, B.: KNIME: The Konstanz Information Miner. Studies in Classification, Data Analysis, and Knowledge Organization (GfKL 2007). Springer (2007)

    Google Scholar 

  4. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)

    Google Scholar 

  5. R Core Team: R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria (2014)

    Google Scholar 

  6. Honaker, J., King, G., Blackwell, M.: Amelia II: A Program for Missing Data. Journal of Statistical Software 45(7), 1–47 (2011)

    Article  Google Scholar 

  7. Orczyk, T., Porwik, P., Bernas, M.: Medical Diagnosis Support System Based on the Ensemble of Single-Parameter Classifiers. Journal of Medical Informatics and Technologies 23, 173–180 (2014)

    Google Scholar 

  8. Wozniak, M., Krawczyk, B.: Combined classifier based on feature space partitioning. International Journal of Applied Mathematics and Computer Science 22(4), 855–866 (2012)

    Article  MathSciNet  Google Scholar 

  9. Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. John Wiley & Sons, New York (1987)

    MATH  Google Scholar 

  10. Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman and Hall/CRC (1997)

    Google Scholar 

  11. Porwik, P., Sosnowski, M., Wesolowski, T., Wrobel, K.: A computational assessment of a blood vessel’s compliance: a procedure based on computed tomography coronary angiography. In: Corchado, E., Kurzyński, M., Woźniak, M. (eds.) HAIS 2011, Part I. LNCS, vol. 6678, pp. 428–435. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  12. Doroz, R., Porwik, P.: Handwritten signature recognition with adaptive selection of behavioral features. In: Chaki, N., Cortesi, A. (eds.) CISIM 2011. CCIS, vol. 245, pp. 128–136. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  13. Foster, K.R., Koprowski, R., Skufca, J.D.: Machine learning, medical diagnosis, and biomedical engineering research – commentary. Biomedical Engineering Online 13, Article No. 94 (2014). doi: 10.1186/1475-925X-13-94

  14. Bernas, M., Orczyk, T., Porwik, P.: Fusion of Granular Computing and k –NN Classifiers for medical data support system. In: Nguyen, N.T., Trawiński, B., Kosala, R. (eds.) ACIIDS 2015. LNCS, vol. 9012, pp. 62–71. Springer, Heidelberg (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tomasz Orczyk .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Orczyk, T., Porwik, P. (2015). Investigation of the Impact of Missing Value Imputation Methods on the k-NN Classification Accuracy. In: Núñez, M., Nguyen, N., Camacho, D., Trawiński, B. (eds) Computational Collective Intelligence. Lecture Notes in Computer Science(), vol 9330. Springer, Cham. https://doi.org/10.1007/978-3-319-24306-1_54

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24306-1_54

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24305-4

  • Online ISBN: 978-3-319-24306-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics