Skip to main content

Advertisement

Log in

Bayesian networks for imputation in classification problems

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Missing values are an important problem in data mining. In order to tackle this problem in classification tasks, we propose two imputation methods based on Bayesian networks. These methods are evaluated in the context of both prediction and classification tasks. We compare the obtained results with those achieved by classical imputation methods (Expectation–Maximization, Data Augmentation, Decision Trees, and Mean/Mode). Our simulations were performed by means of four datasets (Congressional Voting Records, Mushroom, Wisconsin Breast Cancer and Adult), which are benchmarks for data mining methods. Missing values were simulated in these datasets by means of the elimination of some known values. Thus, it is possible to assess the prediction capability of an imputation method, comparing the original values with the imputed ones. In addition, we propose a methodology to estimate the bias inserted by imputation methods in classification tasks. In this sense, we use four classifiers (One Rule, Naïve Bayes, J4.8 Decision Tree and PART) to evaluate the employed imputation methods in classification scenarios. Computing times consumed to perform imputations are also reported. Simulation results in terms of prediction, classification, and computing times allow us performing several analyses, leading to interesting conclusions. Bayesian networks have shown to be competitive with classical imputation methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Anderson, R. L. (1946). Missing plot techniques. Biometrics, 2, 41–47.

    Article  Google Scholar 

  • Batista, G. E. A. P. A., & Monard, M. C. (2003). An analysis of four missing data treatment methods for supervised learning. Applied Artificial Intelligence, 17(5–6), 519–533.

    Article  Google Scholar 

  • Beinlich, I. A., Suermondt, H. J., Chavez, R. M., & Cooper, G. F. (1989). The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. Proceedings of the Second European Conference on Artificial Intelligence in Medicine, 247–256.

  • Bilmes, J. (1997). A gentle tutorial on the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report, University of Berkeley, ICSI-TR-97-021.

  • Cano, R., Sordo, C., & Gutiérrez, J. M., (2004). Applications of Bayesian networks in meteorology. In J. A. Gámez, et al. (Eds.), Advances in Bayesian networks (pp. 309–327). Springer-Verlag.

  • Cheng, J., & Greine, R. (1999). Comparing Bayesian network classifiers. Proc. of the fifteenth conference on uncertainty in artificial intelligence (UAI ’99) (pp. 101–108). Sweden.

  • Cooper, G., & Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9, 309–347.

    MATH  Google Scholar 

  • DeGroot, M. H. (1970). Optimal statistical decision. New York: McGraw-Hill.

    Google Scholar 

  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39, 1–39.

    MATH  MathSciNet  Google Scholar 

  • Di Zio, M., Scanu, M., Coppola, L., Luzi, O., & Ponti, A. (2004). Bayesian networks for imputation. Journal of the Royal Statistical Society A, 167(Part 2), 309–322.

    Article  Google Scholar 

  • Druzdzel, M. J. (1999). SMILE: Structural Modeling, Inference, and Learning Engine and GeNIe: A development environment for graphical decision-theoretic models (Intelligent Systems Demonstration). In Proceedings of the sixteenth national conference on artificial intelligence (AAAI-99) (pp. 902–903). Menlo Park, CA: AAAI Press/The MIT Press.

  • Friedman, N. (1997). Learning belief networks in the presence of missing values and hidden variables. Proceedings of the 14th International Conference on Machine Learning.

  • Friedman, H. F., Kohavi, R., & Yun, Y. (1996). Lazy decision trees. In Proceedings of the 13th national conference on artificial intelligence (pp. 717–724). Cambridge, MA: AAAI Press/MIT Press.

  • Friedman, N., Linial, M., Nachman, I., & Pe’er, D. (2000). Using Bayesian networks to analyze expression data. Proc. of the fourth international annual conference on computational molecular biology (pp. 127–135). New York: ACM Press.

  • Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (1995). Bayesian data analysis. London: Chapman & Hall.

    Google Scholar 

  • Ghahramami, Z., & Jordan, M. (1995). Learning from incomplete data (Tech. Rep. AI Lab Memo No. 1509, CBCL Paper N°. 108). MIT AI Lab.

  • Gilks W. R., Richardson, S., & Spiegelhalter D. J. (1996). Markov chain Monte Carlo in practice. London: Chapman & Hall.

    MATH  Google Scholar 

  • Gilks, W. R., & Roberts, G. O. (1996). Strategies for improving MCMC. In W. R. Gilks, S. Richardson, & D. J. Spiegelhalter (Eds.), Markov chain Monte Carlo in practice (pp. 89–114). London: Chapman & Hall.

    Google Scholar 

  • Han, J., & Kamber, M. (2001). Data mining: Concepts and techniques. Morgan Kaufmann.

  • Heckerman, D. (1995). A tutorial on learning Bayesian networks. Technical Report MSR-TR-95-06. Microsoft Research, Advanced Technology Division, Microsoft Corporation.

  • Hruschka Jr., E. R., & Ebecken, N. F. F. (2002). Missing values prediction with K2. Intelligent Data Analysis 6(6). (The Netherlands)

  • Hruschka Jr., E. R., & Ebecken, N. F. F. (2003). Variable ordering for bayesian networks learning from data. In Proceedings of the international conference on computational intelligence for modeling, control and automation—CIMCA’2003, Vienna.

  • Hruschka Jr., E. R., Hruschka, E. R., & Ebecken, N. F. F. (2004). Feature selection by Bayesian networks. Lecture Notes in Artificial Intelligence, 3060, 370–379.

    Google Scholar 

  • Hsu, W. H. (2004). Genetic wrappers for feature selection in decision tree induction and variable ordering in Bayesian network structure learning. Information Sciences, 163, 103–122.

    Article  MathSciNet  Google Scholar 

  • Jordan, M., & Jacobs, R. (1994). Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6, 181–214.

    Google Scholar 

  • Jordan, M., & Xu, L. (1996). Convergence results for the EM approach to mixtures of experts architectures. Neural Networks, 8, 1409–1431.

    Article  Google Scholar 

  • Kononenko, I., Bratko, I., & Roskar, E. (1984). Experiments in automatic learning of medical diagnostic rules (Tech. Rep.). Ljubjana, Yogoslavia: Jozef Stefan Institute.

  • Lam, W., & Bacchus, F. (1994). Learning Bayesian belief networks, an approach based on the MDL principle. Computational Intelligence, 10, 269–293.

    Article  Google Scholar 

  • Little, R., & Rubin, D. (1987). Statistical analysis with missing data. New York: Wiley.

    MATH  Google Scholar 

  • Lobo, O. O., & Noneao, M. (2000). Ordered estimation of missing values for propositional learning. Journal of the Japanese Society for Artificial Intelligence, 15(1), 499–503.

    Google Scholar 

  • Madsen, A. L., Lang, M., Kjærulff, U. B., & Jensen, F. (2003). The Hugin Tool for learning Bayesian Networks. Lecture Notes in Computer Science, 2711, 594–605.

    Article  Google Scholar 

  • Merz, C. J., & Murphy, P. M. (1997). UCI Repository of Machine Learning Databases. Retrieved from http://www.ics.uci.edu. Irvine, California: University of California, Department of Information and Computer Science.

  • Mitchell, T. (1997). Machine learning. New York: McGraw-Hill.

    MATH  Google Scholar 

  • Nigam, K. (2001). Using unlabeled data to improve text classification (Tech. Rep. CMU-CS-01-126). Doctoral dissertation, Computer Science Department, Carnegie Mellon University.

  • Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Preece, A. D. (1971). Iterative procedures for missing values in experiments. Technometrics, 13, 743–753.

    Article  Google Scholar 

  • Pyle, D. (1999). Data preparation for data mining. San Diego, CA: Academic.

    Google Scholar 

  • Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1, 81–106.

    Google Scholar 

  • Quinlan, J. R. (1989). Unknown attribute values in induction. Proceedings of 6th international workshop on machine learning (pp. 164–168). Ithaca, NY.

  • Redner, R., & Walker, H. (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Review, 26(2), 152–239.

    Article  MathSciNet  Google Scholar 

  • Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592.

    Article  MATH  MathSciNet  Google Scholar 

  • Rubin, D. B. (1977). Formalizing subjective notion about the effects of nonrespondents in samples surveys. Journal of the American Statistical Association, 72, 538–543.

    Article  MATH  MathSciNet  Google Scholar 

  • Rubin, D. B. (1987). Multiple imputation for non responses in surveys. New York: Wiley.

    Google Scholar 

  • Schafer, J. L. (2000). Analysis of incomplete multivariate data. London: Chapman & Hall/CRC.

    Google Scholar 

  • Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7(2), 147–177.

    Article  Google Scholar 

  • Schwartz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–464.

    MathSciNet  Google Scholar 

  • Sebastiani, P., & Ramoni, M. (1997). Bayesian inference with missing data using bound and collapse (Tech. Rep. KMI-TR-58). KMI, Open University.

  • Spiegelhalter, D. J., & Lauritzen, S. L. (1990). Sequential updating of conditional probability on direct graphical structures. Networks, 20, 576–606.

    Article  MathSciNet  Google Scholar 

  • Spiegelhalter, D. J., Thomas, A., & Best, N. G. (1996). Computation on Bayesian graphical models. Bayesian Statistics, 5, 407–425. Retrieved from http://www.mrc.bsu.cam.ac.uk/bugs.

  • Spiegelhalter, D. J., Thomas, A., & Best, N. G. (1999). WINBUGS: Bayesian inference using Gibbs sampling, Version 1.3. Cambridge, UK: MRC Biostatistics Unit.

    Google Scholar 

  • Spirtes P., Glymour C., & Scheines R. (1993). Causation, predication, and search. New York: Springer-Verlag.

    Google Scholar 

  • Tanner, M. A., & Wong, W. H. (1987). The calculation of posterior distributions by data augmentation (with discussion). Journal of the American Statistical Association, 82, 528–550.

    Article  MATH  MathSciNet  Google Scholar 

  • White, A. P. (1987). Probabilistic induction by dynamic path generation in virtual trees. In M. A. Bramer (Ed.), Research and development in expert systems III, (pp. 35–46). Cambridge: Cambridge University Press.

    Google Scholar 

  • Witten, I. H., & Frank, E. (2000). Data mining—practical machine learning tools and techniques with java implementations. USA: Morgan Kaufmann Publishers.

    Google Scholar 

  • Wu, C. F. J. (1983). On the convergence properties of the EM algorithm. The Annals of Statistics, 11(1), 95–103.

    MATH  MathSciNet  Google Scholar 

  • Xu, L., & Jordan, M. (1996). On convergence properties of the EM algorithm for Gaussian mixtures. Neural Computation, 8, 129–151.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Estevam R. Hruschka Jr..

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hruschka, E.R., Hruschka, E.R. & Ebecken, N.F.F. Bayesian networks for imputation in classification problems. J Intell Inf Syst 29, 231–252 (2007). https://doi.org/10.1007/s10844-006-0016-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-006-0016-x

Keywords

Navigation