Bayesian networks for imputation in classification problems

Hruschka, Estevam R.; Hruschka, Eduardo R.; Ebecken, Nelson F. F.

doi:10.1007/s10844-006-0016-x

Bayesian networks for imputation in classification problems

Published: 24 January 2007

Volume 29, pages 231–252, (2007)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Estevam R. Hruschka Jr.¹,
Eduardo R. Hruschka² &
Nelson F. F. Ebecken³

443 Accesses
39 Citations
6 Altmetric
Explore all metrics

Abstract

Missing values are an important problem in data mining. In order to tackle this problem in classification tasks, we propose two imputation methods based on Bayesian networks. These methods are evaluated in the context of both prediction and classification tasks. We compare the obtained results with those achieved by classical imputation methods (Expectation–Maximization, Data Augmentation, Decision Trees, and Mean/Mode). Our simulations were performed by means of four datasets (Congressional Voting Records, Mushroom, Wisconsin Breast Cancer and Adult), which are benchmarks for data mining methods. Missing values were simulated in these datasets by means of the elimination of some known values. Thus, it is possible to assess the prediction capability of an imputation method, comparing the original values with the imputed ones. In addition, we propose a methodology to estimate the bias inserted by imputation methods in classification tasks. In this sense, we use four classifiers (One Rule, Naïve Bayes, J4.8 Decision Tree and PART) to evaluate the employed imputation methods in classification scenarios. Computing times consumed to perform imputations are also reported. Simulation results in terms of prediction, classification, and computing times allow us performing several analyses, leading to interesting conclusions. Bayesian networks have shown to be competitive with classical imputation methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Anderson, R. L. (1946). Missing plot techniques. Biometrics, 2, 41–47.
Article Google Scholar
Batista, G. E. A. P. A., & Monard, M. C. (2003). An analysis of four missing data treatment methods for supervised learning. Applied Artificial Intelligence, 17(5–6), 519–533.
Article Google Scholar
Beinlich, I. A., Suermondt, H. J., Chavez, R. M., & Cooper, G. F. (1989). The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. Proceedings of the Second European Conference on Artificial Intelligence in Medicine, 247–256.
Bilmes, J. (1997). A gentle tutorial on the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report, University of Berkeley, ICSI-TR-97-021.
Cano, R., Sordo, C., & Gutiérrez, J. M., (2004). Applications of Bayesian networks in meteorology. In J. A. Gámez, et al. (Eds.), Advances in Bayesian networks (pp. 309–327). Springer-Verlag.
Cheng, J., & Greine, R. (1999). Comparing Bayesian network classifiers. Proc. of the fifteenth conference on uncertainty in artificial intelligence (UAI ’99) (pp. 101–108). Sweden.
Cooper, G., & Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9, 309–347.
MATH Google Scholar
DeGroot, M. H. (1970). Optimal statistical decision. New York: McGraw-Hill.
Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39, 1–39.
MATH MathSciNet Google Scholar
Di Zio, M., Scanu, M., Coppola, L., Luzi, O., & Ponti, A. (2004). Bayesian networks for imputation. Journal of the Royal Statistical Society A, 167(Part 2), 309–322.
Article Google Scholar
Druzdzel, M. J. (1999). SMILE: Structural Modeling, Inference, and Learning Engine and GeNIe: A development environment for graphical decision-theoretic models (Intelligent Systems Demonstration). In Proceedings of the sixteenth national conference on artificial intelligence (AAAI-99) (pp. 902–903). Menlo Park, CA: AAAI Press/The MIT Press.
Friedman, N. (1997). Learning belief networks in the presence of missing values and hidden variables. Proceedings of the 14th International Conference on Machine Learning.
Friedman, H. F., Kohavi, R., & Yun, Y. (1996). Lazy decision trees. In Proceedings of the 13th national conference on artificial intelligence (pp. 717–724). Cambridge, MA: AAAI Press/MIT Press.
Friedman, N., Linial, M., Nachman, I., & Pe’er, D. (2000). Using Bayesian networks to analyze expression data. Proc. of the fourth international annual conference on computational molecular biology (pp. 127–135). New York: ACM Press.
Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (1995). Bayesian data analysis. London: Chapman & Hall.
Google Scholar
Ghahramami, Z., & Jordan, M. (1995). Learning from incomplete data (Tech. Rep. AI Lab Memo No. 1509, CBCL Paper N°. 108). MIT AI Lab.
Gilks W. R., Richardson, S., & Spiegelhalter D. J. (1996). Markov chain Monte Carlo in practice. London: Chapman & Hall.
MATH Google Scholar
Gilks, W. R., & Roberts, G. O. (1996). Strategies for improving MCMC. In W. R. Gilks, S. Richardson, & D. J. Spiegelhalter (Eds.), Markov chain Monte Carlo in practice (pp. 89–114). London: Chapman & Hall.
Google Scholar
Han, J., & Kamber, M. (2001). Data mining: Concepts and techniques. Morgan Kaufmann.
Heckerman, D. (1995). A tutorial on learning Bayesian networks. Technical Report MSR-TR-95-06. Microsoft Research, Advanced Technology Division, Microsoft Corporation.
Hruschka Jr., E. R., & Ebecken, N. F. F. (2002). Missing values prediction with K2. Intelligent Data Analysis 6(6). (The Netherlands)
Hruschka Jr., E. R., & Ebecken, N. F. F. (2003). Variable ordering for bayesian networks learning from data. In Proceedings of the international conference on computational intelligence for modeling, control and automation—CIMCA’2003, Vienna.
Hruschka Jr., E. R., Hruschka, E. R., & Ebecken, N. F. F. (2004). Feature selection by Bayesian networks. Lecture Notes in Artificial Intelligence, 3060, 370–379.
Google Scholar
Hsu, W. H. (2004). Genetic wrappers for feature selection in decision tree induction and variable ordering in Bayesian network structure learning. Information Sciences, 163, 103–122.
Article MathSciNet Google Scholar
Jordan, M., & Jacobs, R. (1994). Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6, 181–214.
Google Scholar
Jordan, M., & Xu, L. (1996). Convergence results for the EM approach to mixtures of experts architectures. Neural Networks, 8, 1409–1431.
Article Google Scholar
Kononenko, I., Bratko, I., & Roskar, E. (1984). Experiments in automatic learning of medical diagnostic rules (Tech. Rep.). Ljubjana, Yogoslavia: Jozef Stefan Institute.
Lam, W., & Bacchus, F. (1994). Learning Bayesian belief networks, an approach based on the MDL principle. Computational Intelligence, 10, 269–293.
Article Google Scholar
Little, R., & Rubin, D. (1987). Statistical analysis with missing data. New York: Wiley.
MATH Google Scholar
Lobo, O. O., & Noneao, M. (2000). Ordered estimation of missing values for propositional learning. Journal of the Japanese Society for Artificial Intelligence, 15(1), 499–503.
Google Scholar
Madsen, A. L., Lang, M., Kjærulff, U. B., & Jensen, F. (2003). The Hugin Tool for learning Bayesian Networks. Lecture Notes in Computer Science, 2711, 594–605.
Article Google Scholar
Merz, C. J., & Murphy, P. M. (1997). UCI Repository of Machine Learning Databases. Retrieved from http://www.ics.uci.edu. Irvine, California: University of California, Department of Information and Computer Science.
Mitchell, T. (1997). Machine learning. New York: McGraw-Hill.
MATH Google Scholar
Nigam, K. (2001). Using unlabeled data to improve text classification (Tech. Rep. CMU-CS-01-126). Doctoral dissertation, Computer Science Department, Carnegie Mellon University.
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Preece, A. D. (1971). Iterative procedures for missing values in experiments. Technometrics, 13, 743–753.
Article Google Scholar
Pyle, D. (1999). Data preparation for data mining. San Diego, CA: Academic.
Google Scholar
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1, 81–106.
Google Scholar
Quinlan, J. R. (1989). Unknown attribute values in induction. Proceedings of 6th international workshop on machine learning (pp. 164–168). Ithaca, NY.
Redner, R., & Walker, H. (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Review, 26(2), 152–239.
Article MathSciNet Google Scholar
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592.
Article MATH MathSciNet Google Scholar
Rubin, D. B. (1977). Formalizing subjective notion about the effects of nonrespondents in samples surveys. Journal of the American Statistical Association, 72, 538–543.
Article MATH MathSciNet Google Scholar
Rubin, D. B. (1987). Multiple imputation for non responses in surveys. New York: Wiley.
Google Scholar
Schafer, J. L. (2000). Analysis of incomplete multivariate data. London: Chapman & Hall/CRC.
Google Scholar
Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7(2), 147–177.
Article Google Scholar
Schwartz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–464.
MathSciNet Google Scholar
Sebastiani, P., & Ramoni, M. (1997). Bayesian inference with missing data using bound and collapse (Tech. Rep. KMI-TR-58). KMI, Open University.
Spiegelhalter, D. J., & Lauritzen, S. L. (1990). Sequential updating of conditional probability on direct graphical structures. Networks, 20, 576–606.
Article MathSciNet Google Scholar
Spiegelhalter, D. J., Thomas, A., & Best, N. G. (1996). Computation on Bayesian graphical models. Bayesian Statistics, 5, 407–425. Retrieved from http://www.mrc.bsu.cam.ac.uk/bugs.
Spiegelhalter, D. J., Thomas, A., & Best, N. G. (1999). WINBUGS: Bayesian inference using Gibbs sampling, Version 1.3. Cambridge, UK: MRC Biostatistics Unit.
Google Scholar
Spirtes P., Glymour C., & Scheines R. (1993). Causation, predication, and search. New York: Springer-Verlag.
Google Scholar
Tanner, M. A., & Wong, W. H. (1987). The calculation of posterior distributions by data augmentation (with discussion). Journal of the American Statistical Association, 82, 528–550.
Article MATH MathSciNet Google Scholar
White, A. P. (1987). Probabilistic induction by dynamic path generation in virtual trees. In M. A. Bramer (Ed.), Research and development in expert systems III, (pp. 35–46). Cambridge: Cambridge University Press.
Google Scholar
Witten, I. H., & Frank, E. (2000). Data mining—practical machine learning tools and techniques with java implementations. USA: Morgan Kaufmann Publishers.
Google Scholar
Wu, C. F. J. (1983). On the convergence properties of the EM algorithm. The Annals of Statistics, 11(1), 95–103.
MATH MathSciNet Google Scholar
Xu, L., & Jordan, M. (1996). On convergence properties of the EM algorithm for Gaussian mixtures. Neural Computation, 8, 129–151.
Google Scholar

Download references

Author information

Authors and Affiliations

UFSCar/Federal University of São Carlos, São Carlos, Brazil
Estevam R. Hruschka Jr.
Catholic University of Santos (UniSantos), Santos, Brazil
Eduardo R. Hruschka
COPPE/Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
Nelson F. F. Ebecken

Authors

Estevam R. Hruschka Jr.
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo R. Hruschka
View author publications
You can also search for this author in PubMed Google Scholar
Nelson F. F. Ebecken
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Estevam R. Hruschka Jr..

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hruschka, E.R., Hruschka, E.R. & Ebecken, N.F.F. Bayesian networks for imputation in classification problems. J Intell Inf Syst 29, 231–252 (2007). https://doi.org/10.1007/s10844-006-0016-x

Download citation

Received: 16 August 2004
Revised: 28 November 2005
Accepted: 07 December 2005
Published: 24 January 2007
Issue Date: December 2007
DOI: https://doi.org/10.1007/s10844-006-0016-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian networks for imputation in classification problems

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey on missing data in machine learning

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bayesian networks for imputation in classification problems

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey on missing data in machine learning

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation