Skip to main content
Log in

An effective method for classification with missing values

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Classification is one of the most important tasks in machine learning with a huge number of real-life applications. In many practical classification problems, the available information for making object classification is partial or incomplete because some attribute values can be missing due to various reasons. These missing values can significantly affect the efficacy of the classification model. So it is crucial to develop effective techniques to impute these missing values. A number of methods have been introduced for solving classification problem with missing values. However they have various problems. So, we introduce an effective method for imputing missing values using the correlation among the attributes. Other methods which consider correlation for imputing missing values works better either for categorical or numeric data, or designed for a particular application only. Moreover they will not work if all the records have at least one missing attribute. Our method, Model based Missing value Imputation using Correlation (MMIC), can effectively impute both categorical and numeric data. It uses an effective model based technique for filling the missing values attribute wise and reusing then effectively using the model. Extensive performance analyzes show that our proposed approach achieves high performance in imputing missing values and thus increases the efficacy of the classifier. The experimental results also show that our method outperforms various existing methods for handling missing data in classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Chechik G, Heitz G, Elidan G, Abbeel P, Koller D (2008) Max-margin classification of data with absent features. J Mach Learn Res 9:1–21

    MATH  Google Scholar 

  2. Datta S, Misra D, Das S (2016) A feature weighted penalty based dissimilarity measure for k-nearest neighbor classification with missing features. Pattern Recogn Lett 80:231–237

    Article  Google Scholar 

  3. Farhangfar A, Kurgan LA, Dy JG (2008) Impact of imputation of missing values on classification error for discrete data. Pattern Recogn 41(12):3692–3705

    Article  MATH  Google Scholar 

  4. Deb R, Liew AW (2015) Incorrect attribute value detection for traffic accident data. In: 2015 international joint conference on neural networks, IJCNN 2015. Killarney, pp 1–7

  5. Deb R, Liew AW, Oh E (2014) A correlation based imputation method for incomplete traffic accident data. In: PRICAI 2014: trends in artificial intelligence - 13th pacific rim international conference on artificial intelligence. Gold Coast, Proceedings, 2014, pp 905–912

  6. Deb R, Liew AW (2014) Missing value imputation for the analysis of incomplete traffic accident data. In: Machine learning and cybernetics - 13th international conference. Lanzhou, Proceedings, pp 275–286

  7. Datta S, Bhattacharjee S, Das S Clustering with missing features: a penalized dissimilarity measure based approach, CoRR http://arXiv.org/abs/1604.06602

  8. Batista GEAPA, Monard MC (2003) An analysis of four missing data treatment methods for supervised learning. Appl Artif Intell 17(5-6):519–533

    Article  Google Scholar 

  9. Grzymala-Busse JW, Hu M (2000) A comparison of several approaches to missing attribute values in data mining. In: Rough sets and current trends in computing, second international conference, RSCTC 2000 Banff. Canada, Revised Papers, pp 378–385

  10. Cheng K, Law N, Siu W (2012) Iterative bicluster-based least square framework for estimation of missing values in microarray gene expression data. Pattern Recogn 45(4):1281–1289

    Article  Google Scholar 

  11. Deb R, Liew AW (2016) Missing value imputation for the analysis of incomplete traffic accident data. Inf Sci 339:274–289

    Article  Google Scholar 

  12. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Statist Soc Series B 39:1–38

    MathSciNet  MATH  Google Scholar 

  13. Fogue M, Garrido P, Martinez FJ, Cano J, Calafate CMT, Manzoni P (2013) A novel approach for traffic accidents sanitary resource allocation based on multi-objective genetic algorithms. Expert Syst Appl 40(1):323–336

    Article  Google Scholar 

  14. Liu C, Dai D, Yan H (2010) The theoretic framework of local weighted approximation for microarray missing value estimation. Pattern Recogn 43(8):2993–3002

    Article  MATH  Google Scholar 

  15. Gan XC, Liew AWC, Yan H (2006) Microarray missing data imputation based on a set theoretic framework and biological constraints. In: ICPR, pp III: 842–845

  16. Junninen H, Niska H, Tuppurainen K, Ruuskanen J, Kolehmainen M (2004) Methods for imputation of missing values in air quality data sets. Atmos Environ 38(18):2895–2907

    Article  Google Scholar 

  17. Silva-Ramírez E-L, Pino-Mejías R, López-Coello M, de-la Vega M-DC (2011) Missing value imputation on missing completely at random data using multilayer perceptrons. Neural Netw 24(1):121–129

    Article  Google Scholar 

  18. Schneider T (2001) Analysis of incomplete climate data: estimation of mean values and covariance matrices and imputation of missing values. J Clim 14:5

    Article  Google Scholar 

  19. Dixon JK (1979) Pattern recognition with partly missing data. IEEE Trans Syst Man Cybern 9(10):617–621

    Article  Google Scholar 

  20. Troyanskaya OG, Cantor MN, Sherlock G, Brown PO, Hastie T, Tibshirani R, Botstein D, Altman RB (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520–525

    Article  Google Scholar 

  21. Bo T (2004) Lsimpute: accurate estimation of missing values in microarray data with least squares methods. Nucleic Acid Res 32(3):2004

    Article  MathSciNet  Google Scholar 

  22. Sehgal M (2005) Collateral missing value imputation: a new robust missing value estimation algorithm fpr microarray data. Bioinformatics 21(10):2005

    Article  Google Scholar 

  23. Ashraf M (2011) Iterative weighted k-nn for constructing missing feature values in wisconsin breast cancer dataset. In: 3rd international conference on data mining and intelligent information technology applications. IEEE

  24. Liu Z, Pan Q, Dezert J, Martin A (2016) Adaptive imputation of missing values for incomplete pattern classification. Pattern Recogn 52:85–95

    Article  Google Scholar 

  25. García-Laencina P J, Sancho-Gȯmez J, Figueiras-Vidal AR, Verleysen M (2009) K nearest neighbours with mutual information for simultaneous classification and missing data imputation. Neurocomputing 72(7–9):1483–1493

    Article  Google Scholar 

  26. Aydilek IB, Arslan A (2013) A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm. Inf Sci 233:25–35

    Article  Google Scholar 

  27. Rahman MG, Islam MZ (2013) Missing value imputation using decision trees and decision forests by splitting and merging records: two novel techniques. Knowl-Based Syst 53:51–65

    Article  Google Scholar 

  28. Rahman M (2013) k-dmi: a novel method for missing values imputation using two levels of horizontal partitioning in a data set. In: Proceeding of ADMA2013 conference. Hangzhou

  29. Rahman MG, Islam MZ (2014) FIMUS: a framework for imputing missing values using co-appearance, correlation and similarity analysis. Knowl-Based Syst 56:311–327

    Article  Google Scholar 

  30. Giggins H, Brankovic L (2012) VICUS - a noise addition technique for categorical data. In: Tenth Australasian data mining conference, AusDM 2012. Sydney, pp 139–148

  31. Silva-Ramirez E.-L. (2011) Missing value imputation on missing completely at random data using multilayer perceptions. Neural Netw 24(1):2011

    Article  Google Scholar 

  32. Amiri M, Jensen R (2016) Missing data imputation using fuzzy-rough methods. Neurocomputing 205:152–164

    Article  Google Scholar 

  33. Zhu X, Zhang S, Jin Z, Zhang Z, Xu Z (2011) Missing value estimation for mixed-attribute data sets. IEEE Trans Knowl Data Eng 23(1):110–121

    Article  Google Scholar 

  34. García-Laencina PJ, Sancho-Gȯmez J, Figueiras-Vidal AR (2013) Classifying patterns with missing values using multi-task learning perceptrons. Expert Syst Appl 40(4):1333–1341

    Article  Google Scholar 

  35. Angiulli F, Fassetti F (2013) Nearest neighbor-based classification of uncertain data. TKDD 7(1):1

    Article  Google Scholar 

Download references

Acknowledgments

We would like to express our deep gratitude to the anonymous reviewers of this paper. The useful comments have played a significant role in improving the quality of this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chowdhury Farhan Ahmed.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zahin, S.A., Ahmed, C.F. & Alam, T. An effective method for classification with missing values. Appl Intell 48, 3209–3230 (2018). https://doi.org/10.1007/s10489-018-1139-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-018-1139-9

Keywords

Navigation