Abstract
Within probabilistic classification problems, learning the Markov boundary of the class variable consists in the optimal approach for feature subset selection. In this paper we propose two algorithms that learn the Markov boundary of a selected variable. These algorithms are based on the score+search paradigm for learning Bayesian networks. Both algorithms use standard scoring functions but they perform the search in constrained spaces of class-focused directed acyclic graphs, going through the space by means of operators adapted for the problem. The algorithms have been validated experimentally by using a wide spectrum of databases, and their results show a performance competitive with the state-of-the-art.
Similar content being viewed by others
References
Abramson B, Brown J, Murphy A, Winkler RL (1996) Hailfinder: a Bayesian system for forecasting severe weather. Int J Forecast 12: 57–71
Acid S, de Campos LM (2003) Searching for Bayesian network structures in the space of restricted acyclic partially directed graphs. J Artif Intell Res 18: 445–490
Acid S, de Campos LM, Castellano JG (2005) Learning Bayesian network classifiers: searching in a space of partially directed acyclic graphs. Mach Learn 59(3): 213–235
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19: 716–723
Aliferis CF, Tsamardinos I, Statnikov A (2003) HITON, a novel Markov blanket algorithm for optimal variable selection. In: Proceedings of the 2003 American Medical Informatics Association annual symposium, pp 21–25
Alizadeh AA et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403: 503–511
Alon U et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96: 6745–6750
Armstrong SA et al (2002) MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 30: 41–47
Beinlich IA, Suermondt HJ, Chavez RM, Cooper GF (1989) The alarm monitoring system: a case study with two probabilistic inference techniques for belief networks. In: Proceedings of the European conference on artificial intelligence in medicine, pp 247–256
Binder J, Koller D, Russell S, Kanazawa K (1997) Adaptive probabilistic networks with hidden variables. Mach Learn 29: 213–244
Blum A, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1-2): 245–271
Buntine W (1991) Theory refinement of Bayesian networks. In: Proceedings of the seventh conference on uncertainty in artificial intelligence, pp 52–60
Chickering DM (2002) Learning equivalence classes of Bayesian network structures. J Mach Learn Res 2: 445–498
Chickering DM, Geiger D, Heckerman D (1995) Learning Bayesian networks: search methods and experimental results. In: Preliminary papers of the fifth international workshop on artificial intelligence and statistics, pp 112–128
Cooper GF, Herskovits E (1992) A Bayesian method for the induction of probabilistic networks from data. Mach Learn 9: 309–348
Cowell R (2001) Conditions under which conditional independence and scoring methods lead to identical selection of Bayesian network models. In: Proceedings of the seventeenth annual conference on uncertainty in artificial intelligence, pp 91–97
de Campos LM (2006) A scoring function for learning Bayesian networks based on mutual information and conditional independence tests. J Mach Learn Res 7: 2149–2187
de Campos LM, Castellano JG (2007) Bayesian network learning algorithms using structural restrictions. Int J Approx Reason 45(2): 233–254
de Campos LM, Huete JF (2000) A new approach for learning belief networks using independence criteria. Int J Approx Reason 24(1): 11–37
Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3: 185–205
Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29: 131–163
Fu S, Desmarais MC (2007) Local learning algorithm for Markov blanket discovery. Lect Notes Comput Sci 4830: 68–79
Fu S, Desmarais MC (2008a) Tradeoff analysis of different Markov blanket local learning approaches. Lect Notes Artif Intell 5012: 562–571
Fu S, Desmarais MC (2008b) Fast Markov blanket discovery algorithm via local learning within single pass. Lect Notes Comput Sci 5032: 96–107
Gámez JA, Mateo JL, Puerta JM (2007) A Fast hill-climbing algorithm for Bayesian networks structure learning. Lect Notes Comput Sci 4724: 585–597
Golub TR et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286: 531–537
Gordon GJ et al (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62: 4963–4967
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3: 1157–1182
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3): 389–422
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1): 10–18
Heckerman D, Meek C (1997a) Embedded Bayesian network classifiers. Technical Report MSR-TR-97-06, Microsoft Research, Redmond
Heckerman D, Meek C (1997b) Models and selection criteria for regression and classification. In: Proceedings of the thirteenth conference on uncertainty in artificial intelligence, pp 223–228
Heckerman D, Geiger D, Chickering DM (1995) Learning Bayesian networks: The combination of knowledge and statistical data. Mach Learn 20: 197–243
Inza I, Merino M, Larrañaga P, Quiroga J, Sierra B, Girala M (2001) Feature subset selection by genetic algorithms and estimation of distribution algorithms—a case study in the survival of cirrhotic patients treated with TIPS. Artif Intell Med 23(2): 187–205
Jensen CS (1997) Blocking Gibbs sampling for inference in large and complex Bayesian networks with applications in genetics. PhD Thesis, Aalborg University
John GH, Kohavi R (1994) Irrelevant features and the subset selection problem. In: Proceedings of the eleventh international conference on machine learning, pp 121–129
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1-2): 273–324
Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT, Cambridge
Koller D, Sahami M (1996) Toward optimal feature selection. In: ICML-96: Proceedings of the thirteenth international conference on machine learning, pp 284–292
Kristensen K, Rasmussen IA (2002) The use of a Bayesian network in the design of a decision support system for growing malting barley without use of pesticides. Comput Electron Agric 33: 197–217
Lam W, Bacchus F (1994) Learning Bayesian belief networks. An approach based on the MDL principle. Comput Intell 10: 269–293
Langley P, Sage S (1994) Induction of selective Bayesian classifiers. In: Proceedings of the tenth conference on uncertainty in artificial intelligence, pp 399–406
Langley P, Iba W, Thompson K (1992) An analysis of Bayesian classifiers. In: Proceedings of the national conference on artificial intelligence, pp 223–228
Li H, Huang H, Sun J, Lin C (2010) On sensitivity of case-based reasoning to optimal feature subsets in business failure prediction. Expert Syst Appl 37(7): 4811–4821
Mamitsuka H (2003) Empirical evaluation of ensemble feature subset selection methods for learning from a high-dimensional database in drug design In: Proceedings of the third IEEE symposium on bioinformatics and bioengineering, pp 253–257
Margaritis D, Thrun S (2000) Bayesian network induction via local neighborhoods. In: Proceedings of neural information processing systems, pp 505–511
Moral S (2004) An empirical comparison of score measures for independence. In: Proceedings of the 10th IPMU international conference, pp 1307–1314
Neapolitan RE (2003) Learning Bayesian Networks. Prentice Hall
Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufman, San Francisco
Pellet JP, Elisseeff A (2008) Using Markov blankets for causal structure learning. J Mach Learn Res 9: 1295–1342
Peña JM, Björkegren J, Tegner J (2005) Growing Bayesian network models of gene networks from seed genes. Bioinformatics 21: ii224–ii229
Peña JM, Nilsson R, Björkegren J, Tegner J (2007) Towards scalable and data efficient learning of Markov boundaries. Int J Approx Reason 45(2): 211–232
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8): 1226–1238
Pearl J, Verma T (1990) Equivalence and synthesis of causal models. In: Proceedings of the sixth conference on uncertainty in artificial intelligence, pp 220–227
Pomeroy SL et al (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415: 436–442
Ramsey J (2006) A PC-style Markov blanket search for high dimensional datasets. Technical Report CMU-PHIL-177, Carnegie Mellon University
Rasmussen LK (1995) Bayesian Network for blood typing and parentage verification of cattle. Ph.D. Thesis, Research Centre Foulum, Denmark
Rodriguesde Morais S, Aussem A (2008a) A novel scalable and data efficient feature subset selection algorithm. Lect Notes Comput Sci 5212: 298–312
Rodrigues de Morais S, Aussem A (2008b) A novel scalable and correct Markov boundary learning algorithm under faithfulness condition. In: Proceedings of the 4th European workshop on probabilistic graphical models, pp 81–88
Roos T, Wettig H, Grünwald P, Myllymäki P, Tirri H (2005) On discriminative Bayesian network classifiers and logistic regression. Mach Learn 59: 267–296
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6: 461–464
Shipp MA et al (2002) Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8(1): 68–74
Sierra B, Larrañaga P (1998) Predicting survival in malignant skin melanoma using Bayesian networks automatically induced by genetic algorithms. An empirical comparison between different approaches. Artif Intell Med 14(1–2): 215–230
Singh M, Provan GM (1996) Efficient learning of selective Bayesian network classifiers. In: Proceedings of the thirteenth international conference on machine learning, pp 453–461
Singh D et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1: 203–209
Song X, Ding Y, Huang J, Ge Y (2010) Feature selection for support vector machine in financial crisis prediction: a case study in China. Expert Syst 27(4): 299–310
Spirtes P, Glymour C, Scheines R (1993) Causation, prediction, and search. Springer, New York
Tsamardinos I, Aliferis CF, Statnikov A (2003a) Algorithms for large scale Markov blanket discovery. In: Proceedings of the sixteenth international Florida artificial intelligence research society conference, pp 376–380
Tsamardinos I, Aliferis CF, Statnikov A (2003b) Time and sample efficient discovery of Markov blankets and direct causal relations. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 673–678
Tsamardinos I, Brown LE, Aliferis CF (2006) The max-min hill-climbing Bayesian network structure learning algorithm. Mach Learn 65: 31–78
Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: Proceedings of the fourteenth international conference on machine learning, pp 412–420
Yaramakala S, Margaritis D (2005) Speculative Markov blanket discovery for optimal feature selection. In: Proceedings of the fifth ieee international conference on data mining, pp 809–812
Yeoh EJ et al (2002) Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1: 133–143
Yishi Z, Hong X, Yang H, Gangyi Q (2009) S-IAMB algorithm for Markov blanket discovery. In: Proceedings of the Asia-Pacific conference on information processing, vol 2, pp 379–382
Zheng H, Zhang Y (2008) Feature selection for high-dimensional data in astronomy. Adv Space Res 41(12): 1960–1964
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Charles Elkan.
Rights and permissions
About this article
Cite this article
Acid, S., de Campos, L.M. & Fernández, M. Score-based methods for learning Markov boundaries by searching in constrained spaces. Data Min Knowl Disc 26, 174–212 (2013). https://doi.org/10.1007/s10618-011-0247-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-011-0247-5