Investigation of the Impact of Missing Value Imputation Methods on the k-NN Classification Accuracy

Orczyk, Tomasz; Porwik, Piotr

doi:10.1007/978-3-319-24306-1_54

Tomasz Orczyk¹⁷ &
Piotr Porwik¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9330))

2310 Accesses

Abstract

Paper desribes results of an experiment where various scenarios of missing values occurrence in the data repository has been tested. Experiment was coducted on a publicly available database, containing complete, multidimensional continuous dataspace and multiple classes. Missing values were introduced using “completely at random” scheme. Tested scenarios were: training and testing using incomplete dataset, training on complete data set and testing on incomplete and vice versa. For comparison to data imputation methods also the ensemble of single-feature kNN classifiers, working withoud data imputation, has been tested.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Missing Value Imputation Method Using Separate Features Nearest Neighbors Algorithm

Missing Data Characteristics and the Choice of Imputation Technique: An Empirical Study

Multiple Imputation and Ensemble Learning for Classification with Incomplete Data

References

Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2013). (http://archive.ics.uci.edu/ml)
Saar-Tsechansky, M., Provost, F., Caruana, R.: Handling missing values when applying classification models. Journal of Machine Learning Research 8, 1217–1250 (2007)
MATH Google Scholar
Berthold, M.R., Cebron, N., Dill, F., Gabriel, T.R., Kötter, T., Meinl, T., Ohl, P., Sieb, C., Thiel, K., Wiswedel, B.: KNIME: The Konstanz Information Miner. Studies in Classification, Data Analysis, and Knowledge Organization (GfKL 2007). Springer (2007)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)
Google Scholar
R Core Team: R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria (2014)
Google Scholar
Honaker, J., King, G., Blackwell, M.: Amelia II: A Program for Missing Data. Journal of Statistical Software 45(7), 1–47 (2011)
Article Google Scholar
Orczyk, T., Porwik, P., Bernas, M.: Medical Diagnosis Support System Based on the Ensemble of Single-Parameter Classifiers. Journal of Medical Informatics and Technologies 23, 173–180 (2014)
Google Scholar
Wozniak, M., Krawczyk, B.: Combined classifier based on feature space partitioning. International Journal of Applied Mathematics and Computer Science 22(4), 855–866 (2012)
Article MathSciNet Google Scholar
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. John Wiley & Sons, New York (1987)
MATH Google Scholar
Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman and Hall/CRC (1997)
Google Scholar
Porwik, P., Sosnowski, M., Wesolowski, T., Wrobel, K.: A computational assessment of a blood vessel’s compliance: a procedure based on computed tomography coronary angiography. In: Corchado, E., Kurzyński, M., Woźniak, M. (eds.) HAIS 2011, Part I. LNCS, vol. 6678, pp. 428–435. Springer, Heidelberg (2011)
Chapter Google Scholar
Doroz, R., Porwik, P.: Handwritten signature recognition with adaptive selection of behavioral features. In: Chaki, N., Cortesi, A. (eds.) CISIM 2011. CCIS, vol. 245, pp. 128–136. Springer, Heidelberg (2011)
Chapter Google Scholar
Foster, K.R., Koprowski, R., Skufca, J.D.: Machine learning, medical diagnosis, and biomedical engineering research – commentary. Biomedical Engineering Online 13, Article No. 94 (2014). doi: 10.1186/1475-925X-13-94
Bernas, M., Orczyk, T., Porwik, P.: Fusion of Granular Computing and k –NN Classifiers for medical data support system. In: Nguyen, N.T., Trawiński, B., Kosala, R. (eds.) ACIIDS 2015. LNCS, vol. 9012, pp. 62–71. Springer, Heidelberg (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Science, University of Silesia in Katowice, Bedzinska 39, Sosnowiec, Poland
Tomasz Orczyk & Piotr Porwik

Authors

Tomasz Orczyk
View author publications
You can also search for this author in PubMed Google Scholar
Piotr Porwik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tomasz Orczyk .

Editor information

Editors and Affiliations

Universidad Complutense de Madrid, Madrid, Spain
Manuel Núñez
Wroclaw University of Technology, Wroclaw, Poland
Ngoc Thanh Nguyen
Computer Science Department, Universidad Autónoma De Madrid, Madrid, Spain
David Camacho
Wroclaw University of Technology, Wroclaw, Poland
Bogdan Trawiński

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Orczyk, T., Porwik, P. (2015). Investigation of the Impact of Missing Value Imputation Methods on the k-NN Classification Accuracy. In: Núñez, M., Nguyen, N., Camacho, D., Trawiński, B. (eds) Computational Collective Intelligence. Lecture Notes in Computer Science(), vol 9330. Springer, Cham. https://doi.org/10.1007/978-3-319-24306-1_54

Download citation

DOI: https://doi.org/10.1007/978-3-319-24306-1_54
Published: 24 October 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24305-4
Online ISBN: 978-3-319-24306-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics