Abstract
Missing values are an important problem in data mining. In order to tackle this problem in classification tasks, we propose two imputation methods based on Bayesian networks. These methods are evaluated in the context of both prediction and classification tasks. We compare the obtained results with those achieved by classical imputation methods (Expectation–Maximization, Data Augmentation, Decision Trees, and Mean/Mode). Our simulations were performed by means of four datasets (Congressional Voting Records, Mushroom, Wisconsin Breast Cancer and Adult), which are benchmarks for data mining methods. Missing values were simulated in these datasets by means of the elimination of some known values. Thus, it is possible to assess the prediction capability of an imputation method, comparing the original values with the imputed ones. In addition, we propose a methodology to estimate the bias inserted by imputation methods in classification tasks. In this sense, we use four classifiers (One Rule, Naïve Bayes, J4.8 Decision Tree and PART) to evaluate the employed imputation methods in classification scenarios. Computing times consumed to perform imputations are also reported. Simulation results in terms of prediction, classification, and computing times allow us performing several analyses, leading to interesting conclusions. Bayesian networks have shown to be competitive with classical imputation methods.
Similar content being viewed by others
References
Anderson, R. L. (1946). Missing plot techniques. Biometrics, 2, 41–47.
Batista, G. E. A. P. A., & Monard, M. C. (2003). An analysis of four missing data treatment methods for supervised learning. Applied Artificial Intelligence, 17(5–6), 519–533.
Beinlich, I. A., Suermondt, H. J., Chavez, R. M., & Cooper, G. F. (1989). The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. Proceedings of the Second European Conference on Artificial Intelligence in Medicine, 247–256.
Bilmes, J. (1997). A gentle tutorial on the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report, University of Berkeley, ICSI-TR-97-021.
Cano, R., Sordo, C., & Gutiérrez, J. M., (2004). Applications of Bayesian networks in meteorology. In J. A. Gámez, et al. (Eds.), Advances in Bayesian networks (pp. 309–327). Springer-Verlag.
Cheng, J., & Greine, R. (1999). Comparing Bayesian network classifiers. Proc. of the fifteenth conference on uncertainty in artificial intelligence (UAI ’99) (pp. 101–108). Sweden.
Cooper, G., & Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9, 309–347.
DeGroot, M. H. (1970). Optimal statistical decision. New York: McGraw-Hill.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39, 1–39.
Di Zio, M., Scanu, M., Coppola, L., Luzi, O., & Ponti, A. (2004). Bayesian networks for imputation. Journal of the Royal Statistical Society A, 167(Part 2), 309–322.
Druzdzel, M. J. (1999). SMILE: Structural Modeling, Inference, and Learning Engine and GeNIe: A development environment for graphical decision-theoretic models (Intelligent Systems Demonstration). In Proceedings of the sixteenth national conference on artificial intelligence (AAAI-99) (pp. 902–903). Menlo Park, CA: AAAI Press/The MIT Press.
Friedman, N. (1997). Learning belief networks in the presence of missing values and hidden variables. Proceedings of the 14th International Conference on Machine Learning.
Friedman, H. F., Kohavi, R., & Yun, Y. (1996). Lazy decision trees. In Proceedings of the 13th national conference on artificial intelligence (pp. 717–724). Cambridge, MA: AAAI Press/MIT Press.
Friedman, N., Linial, M., Nachman, I., & Pe’er, D. (2000). Using Bayesian networks to analyze expression data. Proc. of the fourth international annual conference on computational molecular biology (pp. 127–135). New York: ACM Press.
Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (1995). Bayesian data analysis. London: Chapman & Hall.
Ghahramami, Z., & Jordan, M. (1995). Learning from incomplete data (Tech. Rep. AI Lab Memo No. 1509, CBCL Paper N°. 108). MIT AI Lab.
Gilks W. R., Richardson, S., & Spiegelhalter D. J. (1996). Markov chain Monte Carlo in practice. London: Chapman & Hall.
Gilks, W. R., & Roberts, G. O. (1996). Strategies for improving MCMC. In W. R. Gilks, S. Richardson, & D. J. Spiegelhalter (Eds.), Markov chain Monte Carlo in practice (pp. 89–114). London: Chapman & Hall.
Han, J., & Kamber, M. (2001). Data mining: Concepts and techniques. Morgan Kaufmann.
Heckerman, D. (1995). A tutorial on learning Bayesian networks. Technical Report MSR-TR-95-06. Microsoft Research, Advanced Technology Division, Microsoft Corporation.
Hruschka Jr., E. R., & Ebecken, N. F. F. (2002). Missing values prediction with K2. Intelligent Data Analysis 6(6). (The Netherlands)
Hruschka Jr., E. R., & Ebecken, N. F. F. (2003). Variable ordering for bayesian networks learning from data. In Proceedings of the international conference on computational intelligence for modeling, control and automation—CIMCA’2003, Vienna.
Hruschka Jr., E. R., Hruschka, E. R., & Ebecken, N. F. F. (2004). Feature selection by Bayesian networks. Lecture Notes in Artificial Intelligence, 3060, 370–379.
Hsu, W. H. (2004). Genetic wrappers for feature selection in decision tree induction and variable ordering in Bayesian network structure learning. Information Sciences, 163, 103–122.
Jordan, M., & Jacobs, R. (1994). Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6, 181–214.
Jordan, M., & Xu, L. (1996). Convergence results for the EM approach to mixtures of experts architectures. Neural Networks, 8, 1409–1431.
Kononenko, I., Bratko, I., & Roskar, E. (1984). Experiments in automatic learning of medical diagnostic rules (Tech. Rep.). Ljubjana, Yogoslavia: Jozef Stefan Institute.
Lam, W., & Bacchus, F. (1994). Learning Bayesian belief networks, an approach based on the MDL principle. Computational Intelligence, 10, 269–293.
Little, R., & Rubin, D. (1987). Statistical analysis with missing data. New York: Wiley.
Lobo, O. O., & Noneao, M. (2000). Ordered estimation of missing values for propositional learning. Journal of the Japanese Society for Artificial Intelligence, 15(1), 499–503.
Madsen, A. L., Lang, M., Kjærulff, U. B., & Jensen, F. (2003). The Hugin Tool for learning Bayesian Networks. Lecture Notes in Computer Science, 2711, 594–605.
Merz, C. J., & Murphy, P. M. (1997). UCI Repository of Machine Learning Databases. Retrieved from http://www.ics.uci.edu. Irvine, California: University of California, Department of Information and Computer Science.
Mitchell, T. (1997). Machine learning. New York: McGraw-Hill.
Nigam, K. (2001). Using unlabeled data to improve text classification (Tech. Rep. CMU-CS-01-126). Doctoral dissertation, Computer Science Department, Carnegie Mellon University.
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Mateo, CA: Morgan Kaufmann.
Preece, A. D. (1971). Iterative procedures for missing values in experiments. Technometrics, 13, 743–753.
Pyle, D. (1999). Data preparation for data mining. San Diego, CA: Academic.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1, 81–106.
Quinlan, J. R. (1989). Unknown attribute values in induction. Proceedings of 6th international workshop on machine learning (pp. 164–168). Ithaca, NY.
Redner, R., & Walker, H. (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Review, 26(2), 152–239.
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592.
Rubin, D. B. (1977). Formalizing subjective notion about the effects of nonrespondents in samples surveys. Journal of the American Statistical Association, 72, 538–543.
Rubin, D. B. (1987). Multiple imputation for non responses in surveys. New York: Wiley.
Schafer, J. L. (2000). Analysis of incomplete multivariate data. London: Chapman & Hall/CRC.
Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7(2), 147–177.
Schwartz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–464.
Sebastiani, P., & Ramoni, M. (1997). Bayesian inference with missing data using bound and collapse (Tech. Rep. KMI-TR-58). KMI, Open University.
Spiegelhalter, D. J., & Lauritzen, S. L. (1990). Sequential updating of conditional probability on direct graphical structures. Networks, 20, 576–606.
Spiegelhalter, D. J., Thomas, A., & Best, N. G. (1996). Computation on Bayesian graphical models. Bayesian Statistics, 5, 407–425. Retrieved from http://www.mrc.bsu.cam.ac.uk/bugs.
Spiegelhalter, D. J., Thomas, A., & Best, N. G. (1999). WINBUGS: Bayesian inference using Gibbs sampling, Version 1.3. Cambridge, UK: MRC Biostatistics Unit.
Spirtes P., Glymour C., & Scheines R. (1993). Causation, predication, and search. New York: Springer-Verlag.
Tanner, M. A., & Wong, W. H. (1987). The calculation of posterior distributions by data augmentation (with discussion). Journal of the American Statistical Association, 82, 528–550.
White, A. P. (1987). Probabilistic induction by dynamic path generation in virtual trees. In M. A. Bramer (Ed.), Research and development in expert systems III, (pp. 35–46). Cambridge: Cambridge University Press.
Witten, I. H., & Frank, E. (2000). Data mining—practical machine learning tools and techniques with java implementations. USA: Morgan Kaufmann Publishers.
Wu, C. F. J. (1983). On the convergence properties of the EM algorithm. The Annals of Statistics, 11(1), 95–103.
Xu, L., & Jordan, M. (1996). On convergence properties of the EM algorithm for Gaussian mixtures. Neural Computation, 8, 129–151.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hruschka, E.R., Hruschka, E.R. & Ebecken, N.F.F. Bayesian networks for imputation in classification problems. J Intell Inf Syst 29, 231–252 (2007). https://doi.org/10.1007/s10844-006-0016-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-006-0016-x