An effective method for classification with missing values

Zahin, Sabit Anwar; Ahmed, Chowdhury Farhan; Alam, Tahira

doi:10.1007/s10489-018-1139-9

An effective method for classification with missing values

Published: 17 February 2018

Volume 48, pages 3209–3230, (2018)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Sabit Anwar Zahin¹,
Chowdhury Farhan Ahmed^1,2 &
Tahira Alam¹

1327 Accesses
21 Citations
Explore all metrics

Abstract

Classification is one of the most important tasks in machine learning with a huge number of real-life applications. In many practical classification problems, the available information for making object classification is partial or incomplete because some attribute values can be missing due to various reasons. These missing values can significantly affect the efficacy of the classification model. So it is crucial to develop effective techniques to impute these missing values. A number of methods have been introduced for solving classification problem with missing values. However they have various problems. So, we introduce an effective method for imputing missing values using the correlation among the attributes. Other methods which consider correlation for imputing missing values works better either for categorical or numeric data, or designed for a particular application only. Moreover they will not work if all the records have at least one missing attribute. Our method, Model based Missing value Imputation using Correlation (MMIC), can effectively impute both categorical and numeric data. It uses an effective model based technique for filling the missing values attribute wise and reusing then effectively using the model. Extensive performance analyzes show that our proposed approach achieves high performance in imputing missing values and thus increases the efficacy of the classifier. The experimental results also show that our method outperforms various existing methods for handling missing data in classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Correlated Cluster-Based Imputation for Treatment of Missing Values

Handling Missing Values for the CN2 Algorithm

Adaptive multiple imputations of missing values using the class center

Article Open access 28 April 2022

References

Chechik G, Heitz G, Elidan G, Abbeel P, Koller D (2008) Max-margin classification of data with absent features. J Mach Learn Res 9:1–21
MATH Google Scholar
Datta S, Misra D, Das S (2016) A feature weighted penalty based dissimilarity measure for k-nearest neighbor classification with missing features. Pattern Recogn Lett 80:231–237
Article Google Scholar
Farhangfar A, Kurgan LA, Dy JG (2008) Impact of imputation of missing values on classification error for discrete data. Pattern Recogn 41(12):3692–3705
Article MATH Google Scholar
Deb R, Liew AW (2015) Incorrect attribute value detection for traffic accident data. In: 2015 international joint conference on neural networks, IJCNN 2015. Killarney, pp 1–7
Deb R, Liew AW, Oh E (2014) A correlation based imputation method for incomplete traffic accident data. In: PRICAI 2014: trends in artificial intelligence - 13th pacific rim international conference on artificial intelligence. Gold Coast, Proceedings, 2014, pp 905–912
Deb R, Liew AW (2014) Missing value imputation for the analysis of incomplete traffic accident data. In: Machine learning and cybernetics - 13th international conference. Lanzhou, Proceedings, pp 275–286
Datta S, Bhattacharjee S, Das S Clustering with missing features: a penalized dissimilarity measure based approach, CoRR http://arXiv.org/abs/1604.06602
Batista GEAPA, Monard MC (2003) An analysis of four missing data treatment methods for supervised learning. Appl Artif Intell 17(5-6):519–533
Article Google Scholar
Grzymala-Busse JW, Hu M (2000) A comparison of several approaches to missing attribute values in data mining. In: Rough sets and current trends in computing, second international conference, RSCTC 2000 Banff. Canada, Revised Papers, pp 378–385
Cheng K, Law N, Siu W (2012) Iterative bicluster-based least square framework for estimation of missing values in microarray gene expression data. Pattern Recogn 45(4):1281–1289
Article Google Scholar
Deb R, Liew AW (2016) Missing value imputation for the analysis of incomplete traffic accident data. Inf Sci 339:274–289
Article Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Statist Soc Series B 39:1–38
MathSciNet MATH Google Scholar
Fogue M, Garrido P, Martinez FJ, Cano J, Calafate CMT, Manzoni P (2013) A novel approach for traffic accidents sanitary resource allocation based on multi-objective genetic algorithms. Expert Syst Appl 40(1):323–336
Article Google Scholar
Liu C, Dai D, Yan H (2010) The theoretic framework of local weighted approximation for microarray missing value estimation. Pattern Recogn 43(8):2993–3002
Article MATH Google Scholar
Gan XC, Liew AWC, Yan H (2006) Microarray missing data imputation based on a set theoretic framework and biological constraints. In: ICPR, pp III: 842–845
Junninen H, Niska H, Tuppurainen K, Ruuskanen J, Kolehmainen M (2004) Methods for imputation of missing values in air quality data sets. Atmos Environ 38(18):2895–2907
Article Google Scholar
Silva-Ramírez E-L, Pino-Mejías R, López-Coello M, de-la Vega M-DC (2011) Missing value imputation on missing completely at random data using multilayer perceptrons. Neural Netw 24(1):121–129
Article Google Scholar
Schneider T (2001) Analysis of incomplete climate data: estimation of mean values and covariance matrices and imputation of missing values. J Clim 14:5
Article Google Scholar
Dixon JK (1979) Pattern recognition with partly missing data. IEEE Trans Syst Man Cybern 9(10):617–621
Article Google Scholar
Troyanskaya OG, Cantor MN, Sherlock G, Brown PO, Hastie T, Tibshirani R, Botstein D, Altman RB (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520–525
Article Google Scholar
Bo T (2004) Lsimpute: accurate estimation of missing values in microarray data with least squares methods. Nucleic Acid Res 32(3):2004
Article MathSciNet Google Scholar
Sehgal M (2005) Collateral missing value imputation: a new robust missing value estimation algorithm fpr microarray data. Bioinformatics 21(10):2005
Article Google Scholar
Ashraf M (2011) Iterative weighted k-nn for constructing missing feature values in wisconsin breast cancer dataset. In: 3rd international conference on data mining and intelligent information technology applications. IEEE
Liu Z, Pan Q, Dezert J, Martin A (2016) Adaptive imputation of missing values for incomplete pattern classification. Pattern Recogn 52:85–95
Article Google Scholar
García-Laencina P J, Sancho-Gȯmez J, Figueiras-Vidal AR, Verleysen M (2009) K nearest neighbours with mutual information for simultaneous classification and missing data imputation. Neurocomputing 72(7–9):1483–1493
Article Google Scholar
Aydilek IB, Arslan A (2013) A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm. Inf Sci 233:25–35
Article Google Scholar
Rahman MG, Islam MZ (2013) Missing value imputation using decision trees and decision forests by splitting and merging records: two novel techniques. Knowl-Based Syst 53:51–65
Article Google Scholar
Rahman M (2013) k-dmi: a novel method for missing values imputation using two levels of horizontal partitioning in a data set. In: Proceeding of ADMA2013 conference. Hangzhou
Rahman MG, Islam MZ (2014) FIMUS: a framework for imputing missing values using co-appearance, correlation and similarity analysis. Knowl-Based Syst 56:311–327
Article Google Scholar
Giggins H, Brankovic L (2012) VICUS - a noise addition technique for categorical data. In: Tenth Australasian data mining conference, AusDM 2012. Sydney, pp 139–148
Silva-Ramirez E.-L. (2011) Missing value imputation on missing completely at random data using multilayer perceptions. Neural Netw 24(1):2011
Article Google Scholar
Amiri M, Jensen R (2016) Missing data imputation using fuzzy-rough methods. Neurocomputing 205:152–164
Article Google Scholar
Zhu X, Zhang S, Jin Z, Zhang Z, Xu Z (2011) Missing value estimation for mixed-attribute data sets. IEEE Trans Knowl Data Eng 23(1):110–121
Article Google Scholar
García-Laencina PJ, Sancho-Gȯmez J, Figueiras-Vidal AR (2013) Classifying patterns with missing values using multi-task learning perceptrons. Expert Syst Appl 40(4):1333–1341
Article Google Scholar
Angiulli F, Fassetti F (2013) Nearest neighbor-based classification of uncertain data. TKDD 7(1):1
Article Google Scholar

Download references

Acknowledgments

We would like to express our deep gratitude to the anonymous reviewers of this paper. The useful comments have played a significant role in improving the quality of this work.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Dhaka, Dhaka, Bangladesh
Sabit Anwar Zahin, Chowdhury Farhan Ahmed & Tahira Alam
ICube Laboratory, University of Strasbourg, Strasbourg, France
Chowdhury Farhan Ahmed

Authors

Sabit Anwar Zahin
View author publications
You can also search for this author in PubMed Google Scholar
Chowdhury Farhan Ahmed
View author publications
You can also search for this author in PubMed Google Scholar
Tahira Alam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chowdhury Farhan Ahmed.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zahin, S.A., Ahmed, C.F. & Alam, T. An effective method for classification with missing values. Appl Intell 48, 3209–3230 (2018). https://doi.org/10.1007/s10489-018-1139-9

Download citation

Published: 17 February 2018
Issue Date: October 2018
DOI: https://doi.org/10.1007/s10489-018-1139-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An effective method for classification with missing values

Abstract

Access this article

Similar content being viewed by others

Correlated Cluster-Based Imputation for Treatment of Missing Values

Handling Missing Values for the CN2 Algorithm

Adaptive multiple imputations of missing values using the class center

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An effective method for classification with missing values

Abstract

Access this article

Similar content being viewed by others

Correlated Cluster-Based Imputation for Treatment of Missing Values

Handling Missing Values for the CN2 Algorithm

Adaptive multiple imputations of missing values using the class center

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation