Abstract
Paper desribes results of an experiment where various scenarios of missing values occurrence in the data repository has been tested. Experiment was coducted on a publicly available database, containing complete, multidimensional continuous dataspace and multiple classes. Missing values were introduced using “completely at random” scheme. Tested scenarios were: training and testing using incomplete dataset, training on complete data set and testing on incomplete and vice versa. For comparison to data imputation methods also the ensemble of single-feature kNN classifiers, working withoud data imputation, has been tested.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2013). (http://archive.ics.uci.edu/ml)
Saar-Tsechansky, M., Provost, F., Caruana, R.: Handling missing values when applying classification models. Journal of Machine Learning Research 8, 1217–1250 (2007)
Berthold, M.R., Cebron, N., Dill, F., Gabriel, T.R., Kötter, T., Meinl, T., Ohl, P., Sieb, C., Thiel, K., Wiswedel, B.: KNIME: The Konstanz Information Miner. Studies in Classification, Data Analysis, and Knowledge Organization (GfKL 2007). Springer (2007)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)
R Core Team: R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria (2014)
Honaker, J., King, G., Blackwell, M.: Amelia II: A Program for Missing Data. Journal of Statistical Software 45(7), 1–47 (2011)
Orczyk, T., Porwik, P., Bernas, M.: Medical Diagnosis Support System Based on the Ensemble of Single-Parameter Classifiers. Journal of Medical Informatics and Technologies 23, 173–180 (2014)
Wozniak, M., Krawczyk, B.: Combined classifier based on feature space partitioning. International Journal of Applied Mathematics and Computer Science 22(4), 855–866 (2012)
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. John Wiley & Sons, New York (1987)
Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman and Hall/CRC (1997)
Porwik, P., Sosnowski, M., Wesolowski, T., Wrobel, K.: A computational assessment of a blood vessel’s compliance: a procedure based on computed tomography coronary angiography. In: Corchado, E., Kurzyński, M., Woźniak, M. (eds.) HAIS 2011, Part I. LNCS, vol. 6678, pp. 428–435. Springer, Heidelberg (2011)
Doroz, R., Porwik, P.: Handwritten signature recognition with adaptive selection of behavioral features. In: Chaki, N., Cortesi, A. (eds.) CISIM 2011. CCIS, vol. 245, pp. 128–136. Springer, Heidelberg (2011)
Foster, K.R., Koprowski, R., Skufca, J.D.: Machine learning, medical diagnosis, and biomedical engineering research – commentary. Biomedical Engineering Online 13, Article No. 94 (2014). doi: 10.1186/1475-925X-13-94
Bernas, M., Orczyk, T., Porwik, P.: Fusion of Granular Computing and k –NN Classifiers for medical data support system. In: Nguyen, N.T., Trawiński, B., Kosala, R. (eds.) ACIIDS 2015. LNCS, vol. 9012, pp. 62–71. Springer, Heidelberg (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Orczyk, T., Porwik, P. (2015). Investigation of the Impact of Missing Value Imputation Methods on the k-NN Classification Accuracy. In: Núñez, M., Nguyen, N., Camacho, D., Trawiński, B. (eds) Computational Collective Intelligence. Lecture Notes in Computer Science(), vol 9330. Springer, Cham. https://doi.org/10.1007/978-3-319-24306-1_54
Download citation
DOI: https://doi.org/10.1007/978-3-319-24306-1_54
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24305-4
Online ISBN: 978-3-319-24306-1
eBook Packages: Computer ScienceComputer Science (R0)