Skip to main content
Log in

Score-based methods for learning Markov boundaries by searching in constrained spaces

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Within probabilistic classification problems, learning the Markov boundary of the class variable consists in the optimal approach for feature subset selection. In this paper we propose two algorithms that learn the Markov boundary of a selected variable. These algorithms are based on the score+search paradigm for learning Bayesian networks. Both algorithms use standard scoring functions but they perform the search in constrained spaces of class-focused directed acyclic graphs, going through the space by means of operators adapted for the problem. The algorithms have been validated experimentally by using a wide spectrum of databases, and their results show a performance competitive with the state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abramson B, Brown J, Murphy A, Winkler RL (1996) Hailfinder: a Bayesian system for forecasting severe weather. Int J Forecast 12: 57–71

    Article  Google Scholar 

  • Acid S, de Campos LM (2003) Searching for Bayesian network structures in the space of restricted acyclic partially directed graphs. J Artif Intell Res 18: 445–490

    MATH  Google Scholar 

  • Acid S, de Campos LM, Castellano JG (2005) Learning Bayesian network classifiers: searching in a space of partially directed acyclic graphs. Mach Learn 59(3): 213–235

    MATH  Google Scholar 

  • Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19: 716–723

    Article  MathSciNet  MATH  Google Scholar 

  • Aliferis CF, Tsamardinos I, Statnikov A (2003) HITON, a novel Markov blanket algorithm for optimal variable selection. In: Proceedings of the 2003 American Medical Informatics Association annual symposium, pp 21–25

  • Alizadeh AA et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403: 503–511

    Article  Google Scholar 

  • Alon U et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96: 6745–6750

    Article  Google Scholar 

  • Armstrong SA et al (2002) MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 30: 41–47

    Article  Google Scholar 

  • Beinlich IA, Suermondt HJ, Chavez RM, Cooper GF (1989) The alarm monitoring system: a case study with two probabilistic inference techniques for belief networks. In: Proceedings of the European conference on artificial intelligence in medicine, pp 247–256

  • Binder J, Koller D, Russell S, Kanazawa K (1997) Adaptive probabilistic networks with hidden variables. Mach Learn 29: 213–244

    Article  MATH  Google Scholar 

  • Blum A, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1-2): 245–271

    Article  MathSciNet  MATH  Google Scholar 

  • Buntine W (1991) Theory refinement of Bayesian networks. In: Proceedings of the seventh conference on uncertainty in artificial intelligence, pp 52–60

  • Chickering DM (2002) Learning equivalence classes of Bayesian network structures. J Mach Learn Res 2: 445–498

    MathSciNet  MATH  Google Scholar 

  • Chickering DM, Geiger D, Heckerman D (1995) Learning Bayesian networks: search methods and experimental results. In: Preliminary papers of the fifth international workshop on artificial intelligence and statistics, pp 112–128

  • Cooper GF, Herskovits E (1992) A Bayesian method for the induction of probabilistic networks from data. Mach Learn 9: 309–348

    MATH  Google Scholar 

  • Cowell R (2001) Conditions under which conditional independence and scoring methods lead to identical selection of Bayesian network models. In: Proceedings of the seventeenth annual conference on uncertainty in artificial intelligence, pp 91–97

  • de Campos LM (2006) A scoring function for learning Bayesian networks based on mutual information and conditional independence tests. J Mach Learn Res 7: 2149–2187

    MathSciNet  MATH  Google Scholar 

  • de Campos LM, Castellano JG (2007) Bayesian network learning algorithms using structural restrictions. Int J Approx Reason 45(2): 233–254

    Article  MATH  Google Scholar 

  • de Campos LM, Huete JF (2000) A new approach for learning belief networks using independence criteria. Int J Approx Reason 24(1): 11–37

    Article  MATH  Google Scholar 

  • Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3: 185–205

    Article  Google Scholar 

  • Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29: 131–163

    Article  MATH  Google Scholar 

  • Fu S, Desmarais MC (2007) Local learning algorithm for Markov blanket discovery. Lect Notes Comput Sci 4830: 68–79

    Article  MathSciNet  Google Scholar 

  • Fu S, Desmarais MC (2008a) Tradeoff analysis of different Markov blanket local learning approaches. Lect Notes Artif Intell 5012: 562–571

    Google Scholar 

  • Fu S, Desmarais MC (2008b) Fast Markov blanket discovery algorithm via local learning within single pass. Lect Notes Comput Sci 5032: 96–107

    Article  MathSciNet  Google Scholar 

  • Gámez JA, Mateo JL, Puerta JM (2007) A Fast hill-climbing algorithm for Bayesian networks structure learning. Lect Notes Comput Sci 4724: 585–597

    Article  Google Scholar 

  • Golub TR et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286: 531–537

    Article  Google Scholar 

  • Gordon GJ et al (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62: 4963–4967

    Google Scholar 

  • Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3: 1157–1182

    MATH  Google Scholar 

  • Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3): 389–422

    Article  MATH  Google Scholar 

  • Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1): 10–18

    Article  Google Scholar 

  • Heckerman D, Meek C (1997a) Embedded Bayesian network classifiers. Technical Report MSR-TR-97-06, Microsoft Research, Redmond

  • Heckerman D, Meek C (1997b) Models and selection criteria for regression and classification. In: Proceedings of the thirteenth conference on uncertainty in artificial intelligence, pp 223–228

  • Heckerman D, Geiger D, Chickering DM (1995) Learning Bayesian networks: The combination of knowledge and statistical data. Mach Learn 20: 197–243

    MATH  Google Scholar 

  • Inza I, Merino M, Larrañaga P, Quiroga J, Sierra B, Girala M (2001) Feature subset selection by genetic algorithms and estimation of distribution algorithms—a case study in the survival of cirrhotic patients treated with TIPS. Artif Intell Med 23(2): 187–205

    Article  Google Scholar 

  • Jensen CS (1997) Blocking Gibbs sampling for inference in large and complex Bayesian networks with applications in genetics. PhD Thesis, Aalborg University

  • John GH, Kohavi R (1994) Irrelevant features and the subset selection problem. In: Proceedings of the eleventh international conference on machine learning, pp 121–129

  • Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1-2): 273–324

    Article  MATH  Google Scholar 

  • Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT, Cambridge

    Google Scholar 

  • Koller D, Sahami M (1996) Toward optimal feature selection. In: ICML-96: Proceedings of the thirteenth international conference on machine learning, pp 284–292

  • Kristensen K, Rasmussen IA (2002) The use of a Bayesian network in the design of a decision support system for growing malting barley without use of pesticides. Comput Electron Agric 33: 197–217

    Article  Google Scholar 

  • Lam W, Bacchus F (1994) Learning Bayesian belief networks. An approach based on the MDL principle. Comput Intell 10: 269–293

    Article  Google Scholar 

  • Langley P, Sage S (1994) Induction of selective Bayesian classifiers. In: Proceedings of the tenth conference on uncertainty in artificial intelligence, pp 399–406

  • Langley P, Iba W, Thompson K (1992) An analysis of Bayesian classifiers. In: Proceedings of the national conference on artificial intelligence, pp 223–228

  • Li H, Huang H, Sun J, Lin C (2010) On sensitivity of case-based reasoning to optimal feature subsets in business failure prediction. Expert Syst Appl 37(7): 4811–4821

    Article  Google Scholar 

  • Mamitsuka H (2003) Empirical evaluation of ensemble feature subset selection methods for learning from a high-dimensional database in drug design In: Proceedings of the third IEEE symposium on bioinformatics and bioengineering, pp 253–257

  • Margaritis D, Thrun S (2000) Bayesian network induction via local neighborhoods. In: Proceedings of neural information processing systems, pp 505–511

  • Moral S (2004) An empirical comparison of score measures for independence. In: Proceedings of the 10th IPMU international conference, pp 1307–1314

  • Neapolitan RE (2003) Learning Bayesian Networks. Prentice Hall

  • Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufman, San Francisco

    Google Scholar 

  • Pellet JP, Elisseeff A (2008) Using Markov blankets for causal structure learning. J Mach Learn Res 9: 1295–1342

    MathSciNet  MATH  Google Scholar 

  • Peña JM, Björkegren J, Tegner J (2005) Growing Bayesian network models of gene networks from seed genes. Bioinformatics 21: ii224–ii229

    Article  Google Scholar 

  • Peña JM, Nilsson R, Björkegren J, Tegner J (2007) Towards scalable and data efficient learning of Markov boundaries. Int J Approx Reason 45(2): 211–232

    Article  MATH  Google Scholar 

  • Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8): 1226–1238

    Article  Google Scholar 

  • Pearl J, Verma T (1990) Equivalence and synthesis of causal models. In: Proceedings of the sixth conference on uncertainty in artificial intelligence, pp 220–227

  • Pomeroy SL et al (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415: 436–442

    Article  Google Scholar 

  • Ramsey J (2006) A PC-style Markov blanket search for high dimensional datasets. Technical Report CMU-PHIL-177, Carnegie Mellon University

  • Rasmussen LK (1995) Bayesian Network for blood typing and parentage verification of cattle. Ph.D. Thesis, Research Centre Foulum, Denmark

  • Rodriguesde Morais S, Aussem A (2008a) A novel scalable and data efficient feature subset selection algorithm. Lect Notes Comput Sci 5212: 298–312

    Article  Google Scholar 

  • Rodrigues de Morais S, Aussem A (2008b) A novel scalable and correct Markov boundary learning algorithm under faithfulness condition. In: Proceedings of the 4th European workshop on probabilistic graphical models, pp 81–88

  • Roos T, Wettig H, Grünwald P, Myllymäki P, Tirri H (2005) On discriminative Bayesian network classifiers and logistic regression. Mach Learn 59: 267–296

    MATH  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6: 461–464

    Article  MATH  Google Scholar 

  • Shipp MA et al (2002) Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8(1): 68–74

    Article  Google Scholar 

  • Sierra B, Larrañaga P (1998) Predicting survival in malignant skin melanoma using Bayesian networks automatically induced by genetic algorithms. An empirical comparison between different approaches. Artif Intell Med 14(1–2): 215–230

    Article  Google Scholar 

  • Singh M, Provan GM (1996) Efficient learning of selective Bayesian network classifiers. In: Proceedings of the thirteenth international conference on machine learning, pp 453–461

  • Singh D et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1: 203–209

    Article  Google Scholar 

  • Song X, Ding Y, Huang J, Ge Y (2010) Feature selection for support vector machine in financial crisis prediction: a case study in China. Expert Syst 27(4): 299–310

    Article  Google Scholar 

  • Spirtes P, Glymour C, Scheines R (1993) Causation, prediction, and search. Springer, New York

    Book  MATH  Google Scholar 

  • Tsamardinos I, Aliferis CF, Statnikov A (2003a) Algorithms for large scale Markov blanket discovery. In: Proceedings of the sixteenth international Florida artificial intelligence research society conference, pp 376–380

  • Tsamardinos I, Aliferis CF, Statnikov A (2003b) Time and sample efficient discovery of Markov blankets and direct causal relations. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 673–678

  • Tsamardinos I, Brown LE, Aliferis CF (2006) The max-min hill-climbing Bayesian network structure learning algorithm. Mach Learn 65: 31–78

    Article  Google Scholar 

  • Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: Proceedings of the fourteenth international conference on machine learning, pp 412–420

  • Yaramakala S, Margaritis D (2005) Speculative Markov blanket discovery for optimal feature selection. In: Proceedings of the fifth ieee international conference on data mining, pp 809–812

  • Yeoh EJ et al (2002) Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1: 133–143

    Article  Google Scholar 

  • Yishi Z, Hong X, Yang H, Gangyi Q (2009) S-IAMB algorithm for Markov blanket discovery. In: Proceedings of the Asia-Pacific conference on information processing, vol 2, pp 379–382

  • Zheng H, Zhang Y (2008) Feature selection for high-dimensional data in astronomy. Adv Space Res 41(12): 1960–1964

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luis M. de Campos.

Additional information

Responsible editor: Charles Elkan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Acid, S., de Campos, L.M. & Fernández, M. Score-based methods for learning Markov boundaries by searching in constrained spaces. Data Min Knowl Disc 26, 174–212 (2013). https://doi.org/10.1007/s10618-011-0247-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-011-0247-5

Keywords

Navigation