Abstract
This paper introduces a novel enhancement for learning Bayesian networks with a bias for small, high-predictive-accuracy networks. The new approach selects a subset of features that maximizes predictive accuracy prior to the network learning phase. We examine explicitly the effects of two aspects of the algorithm, feature selection and node ordering. Our approach generates networks that are computationally simpler to evaluate and display predictive accuracy comparable to that of Bayesian networks which model all attributes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aha, D.W. and R.L. Bankert (1994). Feature selection for case-based classification of cloud types. In AAAI Workshop on Case-based Reasoning, 106–112, Seattle, WA. AAAI Press.
Amuallim, H. and T.G. Dietterich (1991). Learning with Many Irrelevant Features. In Proc. Conf of the AAAI,547–552. AAAI Press.
Andersen, S.K., K.G. Olesen, F.V. Jensen and F. Jensen (1989). “HUGINa Shell for Building Bayesian Belief Universes for Expert Systems”. Procs. IJCAI, 1080–1085.
Buntine, W.L. and T. Niblett (1992). A further comparison of splitting rules for decision-tree induction. Machine Learning, 7.
Cardie, C. (1993). Using Decision Trees to Improve Case-based Learning. In Proc. Machine Learning,25–32. Morgan Kaufmann.
Caruana, R. and D. Freitag (1994). Greedy attribute selection. In W. Cohen and H. Hirsch, editors, Proc. Machine Learning,28–36. Morgan Kaufmann.
Cheeseman, P. and W. Oldford, editors. (1994). Selecting Models from Data: AI and Statistics IV. Springer-Verlag.
Cooper, G.F. and E. Herskovits (1992). A Bayesian Method for the Induction of of Probabilistic Networks from Data. In Machine Learning 9, 54–62.
]Dawid, A.P. (1992). Prequential Analysis, Stochastic Complexity and Bayesian Inference. In J.M. Bernardo, J. Berger, A. Dawid, and A. Smith, editors, Bayesian Statistics 4,109–125. Oxford Science Publications.
]Dejviver, P. and J. Kittler (1982). Pattern Recognition: A Statistical Approach. Prentice-Hall.
Herskovits, E. and G.F. Cooper (1990). KUTATO: An Entropy-Driven System for Construction of Probabilistic Expert Systems from Databases. In Proc. Conf. Uncertainty in Artificial Intelligence, 54–62.
John, G., R. Kohavi, and K. Pfleger (1994). Irrelevant features and the subset selection problem. Proc. Machine Learning, 121–129. Morgan Kaufmann.
Kira, K. and L. Rendell (1992). A practical approach to feature selection. In Proc. Machine Learning, 249–256, Aberdeen, Scotland. Morgan Kaufmann.
Kira, K. and L. Rendell (1992). The Feature Selection Problem: Traditional Methods and a New Algorithm. In Proc. AAAI,129–134. AAAI Press.
Kononenko, I. (1994). Estimating attributes: Analysis and extension of relief. In Proc. European Conf. on Machine Learning, 171–182.
]Langley, P. (1994). Selection of relevant features in machine learning. In R. Greiner, editor, Proc. AAAI Fall Symposium on Relevance. AAAI Press.
Langley, P. and S. Sage (1994). Induction of selective bayesian classifiers. In Proc. Conf. on Uncertainty in AI,399–406. Morgan Kaufmann.
Madigan, D., A. Raftery, J. York, J. Bradshaw, and R. Almond (1993). Strategies for Graphical Model Selection. In Proc. International Workshop on AI and Statistics, 331–336.
Marill, T. and D. Green (1963). On the effectiveness of receptors in recognition systems. IEEE Trans. on Information Theory, 9: 11–17.
Murphy, P.M. and D.W. Aha (1992). UCI Repository of Machine Learning Databases. Dept. of Information and Computer Science, Univ. of California, Irvine.
]Narendra, M. and K. Fukunaga (1977). A branch and bound algorithm for feature subset selection. IEEE Trans. on Computers, C-26(9):917–922.
Siedlecki, W. and J. Sklansky (1988). On automatic feature selection. Itnl. J. of Pattern Recognition and Artificial Intelligence, 2 (2): 197–220.
Singh, M. and M. Valtorta (1995). Construction of Bayesian Network Structures from Data: a Brief Survey and an Efficient Algorithm. Int. Journal of Approximate Reasoning, 12, 111–131.
]Xu, L., P. Yan, and T. Chang. (1989). Best-first strategy for feature selection. In Proc. Ninth International Conf. on Pattern Recognition,706–708. IEEE Computer Society Press.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1996 Springer-Verlag New York, Inc.
About this chapter
Cite this chapter
Provan, G.M., Singh, M. (1996). Learning Bayesian Networks Using Feature Selection. In: Fisher, D., Lenz, HJ. (eds) Learning from Data. Lecture Notes in Statistics, vol 112. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-2404-4_28
Download citation
DOI: https://doi.org/10.1007/978-1-4612-2404-4_28
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-94736-5
Online ISBN: 978-1-4612-2404-4
eBook Packages: Springer Book Archive