Information-decomposition-model-based missing value estimation for not missing at random dataset

Liu, Shigang; Dai, Honghua; Gan, Min

doi:10.1007/s13042-015-0354-5

Information-decomposition-model-based missing value estimation for not missing at random dataset

Original Article
Published: 29 March 2015

Volume 9, pages 85–95, (2018)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Shigang Liu¹,
Honghua Dai¹ &
Min Gan²

430 Accesses
9 Citations
Explore all metrics

Abstract

Missing data estimation is an important strategy for improving learning performance in learning from incomplete data, especially, when there are non discardable records with missing values. However, most of the existing algorithms are focused on missing at random (MAR) or missing completely at random (MCAR), and less attention has been paid to data not missing at random (NMAR). In this paper, an information decomposition imputation (IDIM) algorithm using fuzzy membership function is proposed for addressing the missing value problem under NMAR. Firstly, the proposed IDIM algorithm is presented with detailed examples. Then, the proposed approach is evaluated with extensive experiments compared with some typical algorithms. The experimental results demonstrate that the proposed algorithm has higher accuracy than the exiting imputation approaches in terms of normal root mean square error (NRMSE) and TP+TN evaluation under different missing strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

LIPFCM: Linear Interpolation-Based Possibilistic Fuzzy C-Means Clustering Imputation Method for Handling Incomplete Data

A Perspective of Missing Value Imputation Approaches

Missing value imputation using a fuzzy clustering-based EM approach

Article 25 February 2015

Md. Geaur Rahman & Md Zahidul Islam

References

Qin Y, Zhang S et al (2009) POP algorithm: kernel-based imputation to treat missing values in knowledge discovery from databases. Expert syst Appl 36(2):2794–2804
Article Google Scholar
Vagin V, Fomina M (2011) Problem of knowledge discovery in noisy databases. Inter J Mach Learn Cybern 2(3):135–145
Article Google Scholar
Yu T, Peng H et al (2011) Incorporating nonlinear relationships in microarray missing value imputation. Comput Biol Bioinform IEEE/ACM Trans 8(3):723–731
Article Google Scholar
Rubin DB (1976) Inference and missing data. Biometrika 63(3):581–592
Article MathSciNet MATH Google Scholar
Zhang S, Qin Z et al (2005) Missing is useful: missing values in cost-sensitive decision trees. Knowl Data Eng IEEE Trans 17(12):1689–1693
Article Google Scholar
Qin Y, Zhang S et al (2007) Semi-parametric optimization for missing data imputation. Appl Intell 27(1):79–88
Article MATH Google Scholar
Saar-Tsechansky M, Provost F (2007) Handling missing values when applying classification models. J Mach Learn Res 8:1217–1250
MathSciNet MATH Google Scholar
Zhu X, Zhang S et al (2011) Missing value estimation for mixed-attribute data sets. Knowl Data Eng IEEE Trans 23(1):110–121
Article MathSciNet Google Scholar
Allison PD (2000) Missing data. Sage Thousand Oaks, USA
MATH Google Scholar
Little RJ, Rubin DB (2002) Statistical analysis with missing data
Rubin DB (2004) Multiple imputation for nonresponse in surveys. John Wiley and Sons, New York
Ramoni M, Sebastiani P (1997) Learning Bayesian networks from incomplete databases. In: Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc., USA
Ghahramani Z, Jordan MI (1997) Mixture models for learning from incomplete data. Comput Learn Theory Nat Learn Syst 4:67–85
Google Scholar
Dick U, Haider P et al. (2008) Learning from incomplete data with infinite imputations. In: Proceedings of the 25th international conference on Machine learning, ACM
Dai H, Ciesielski V (1994) Learning of inexact rules by the fish-net algorithm from low quality data. In: Proceedings of the Eigth Australian Joint Artificial Intelligence Conference, Citeseer
Scheffer J (2002) Dealing with missing data
Dempster AP, Laird NM et al. (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Series B 1–38
Zhang S (2008) Parimputation: from imputation and null-imputation to partially imputation. IEEE Intell Inform Bull 9(1):32–38
Google Scholar
Zhang C, Zhu X et al (2007) GBKII: an imputation method for missing values. Adv Knowl Discov Data Mining 1080–1087
Wang Q, Rao J (2002) Empirical likelihood-based inference under imputation for missing response data. Annal Stat 30(3):896–924
Article MathSciNet MATH Google Scholar
Pérez A, Dennis RJ et al (2002) Use of the mean, hot deck and multiple imputation techniques to predict outcome in intensive care unit patients in Colombia. Stat Med 21(24):3885–3896
Article Google Scholar
Jerez JM, Molina I et al (2010) Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif Intell Med 50(2):105–115
Article Google Scholar
Bø TH, Dysvik B et al (2004) LSimpute: accurate estimation of missing values in microarray data with least squares methods. Nucleic Acids Res 32(3):e34–e34
Choong MK, Charbit M et al (2009) Autoregressive-model-based missing value estimation for DNA microarray time series data. Inform Technol Biomed IEEE Trans 13(1):131–137
Article Google Scholar
Kim H, Golub GH et al (2005) Missing value estimation for DNA microarray gene expression data: local least squares imputation. Bioinformatics 21(2):187–198
Article Google Scholar
Oba S, Sato M-A et al (2003) A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19(16):2088–2096
Article Google Scholar
Wang X, Li A et al (2006) Missing value estimation for DNA microarray gene expression data by support vector regression imputation and orthogonal coding scheme. BMC Bioinform 7(1):32
Article Google Scholar
Wong DS, Wong FK et al (2007) A multi-stage approach to clustering and imputation of gene expression profiles. Bioinformatics 23(8):998–1005
Article Google Scholar
Diggle P, Kenward MG (1994) Informative drop-out in longitudinal data analysis. Appl Stat 49–93
Little RJ (1995) Modeling the drop-out mechanism in repeated-measures studies. J Am Stat Assoc 90(431):1112–1121
Article MathSciNet MATH Google Scholar
Little RJ (2008) Selection and pattern-mixture models. Longitud Data Anal 409–431
Muthén B, Asparouhov T et al (2011) Growth modeling with nonignorable dropout: alternative analyses of the STAR* D antidepressant trial. Psychol Methods 16(1):17
Article Google Scholar
Albert PS, Follmann DA (2009) Shared-parameter models. Longitud Data Anal 433–452
Beunckens C, Molenberghs G et al (2008) A latent class mixture model for incomplete longitudinal Gaussian data. Biometrics 64(1):96–105
Article MathSciNet MATH Google Scholar
Dantan E, Proust-Lima C et al (2008) Pattern mixture models and latent class models for the analysis of multivariate longitudinal data with informative dropouts. Inter J Biostat 4(1):1–26
Article MathSciNet Google Scholar
Roy J, Daniels MJ (2008) A general class of pattern mixture models for nonignorable dropout with many possible dropout times. Biometrics 64(2):538–545
Article MathSciNet MATH Google Scholar
Jansen I, Hens N et al (2006) The nature of sensitivity in monotone missing not at random models. Comput Stat Data Anal 50(3):830–858
Article MathSciNet MATH Google Scholar
Hogan JW, Roy J et al (2004) Handling dropout in longitudinal studies. Stat Med 23(9):1455–1497
Article Google Scholar
Kenward MG (1998) Selection models for repeated measurements with nonandom dropout: an illustration of sensitivity. Stat Med 17(23):2723–2732
Article Google Scholar
Michiels B, Molenberghs G et al (2002) Selection models and patternmixture models to analyse longitudinal quality of life data subject to dropout. Stat Med 21(8):1023–1041
Article Google Scholar
Ma J et al (2014) Fuzzy clustering with non-local information for image segmentation. Inter J Mach Learn Cybern 5(6):845–859
Article Google Scholar
Vishwakarma VP (2013) Illumination normalization using fuzzy filter in DCT domain for face recognition. Inter J Mach Learn Cybern 6(1):17–34
Article Google Scholar
Zadeh LA (1965) Fuzzy sets. Inform control 8(3):338–353
Article MATH Google Scholar
Chongfu H (2000) Demonstration of benefit of information distribution for probability estimation. Signal Process 80(6):1037–1048
Article MATH Google Scholar
Lakshminarayan K, Harp SA et al (1999) Imputation of missing data in industrial databases. Appl Intell 11(3):259–275
Article Google Scholar
Merz CJ, Murphy PM (1998) UCI Repository of machine learning databases
Schneider T (2001) Analysis of incomplete climate data: estimation of mean values and covariance matrices and imputation of missing values. J Climate 14(5):853–871
Article Google Scholar
Troyanskaya O, Cantor M et al (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520–525
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Technology, Deakin University, Melbourne, VIC, 3125, Australia
Shigang Liu & Honghua Dai
Planning and Institutional Performance Unit, Planning and Governance, La Trobe University, Melbourne, VIC, 3083, Australia
Min Gan

Authors

Shigang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Honghua Dai
View author publications
You can also search for this author in PubMed Google Scholar
Min Gan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shigang Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, S., Dai, H. & Gan, M. Information-decomposition-model-based missing value estimation for not missing at random dataset. Int. J. Mach. Learn. & Cyber. 9, 85–95 (2018). https://doi.org/10.1007/s13042-015-0354-5

Download citation

Received: 04 December 2014
Accepted: 18 March 2015
Published: 29 March 2015
Issue Date: January 2018
DOI: https://doi.org/10.1007/s13042-015-0354-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Information-decomposition-model-based missing value estimation for not missing at random dataset

Abstract

Access this article

Similar content being viewed by others

LIPFCM: Linear Interpolation-Based Possibilistic Fuzzy C-Means Clustering Imputation Method for Handling Incomplete Data

A Perspective of Missing Value Imputation Approaches

Missing value imputation using a fuzzy clustering-based EM approach

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Information-decomposition-model-based missing value estimation for not missing at random dataset

Abstract

Access this article

Similar content being viewed by others

LIPFCM: Linear Interpolation-Based Possibilistic Fuzzy C-Means Clustering Imputation Method for Handling Incomplete Data

A Perspective of Missing Value Imputation Approaches

Missing value imputation using a fuzzy clustering-based EM approach

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation