Score-based methods for learning Markov boundaries by searching in constrained spaces

Acid, Silvia; de Campos, Luis M.; Fernández, Moisés

doi:10.1007/s10618-011-0247-5

Score-based methods for learning Markov boundaries by searching in constrained spaces

Published: 06 December 2011

Volume 26, pages 174–212, (2013)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Silvia Acid¹,
Luis M. de Campos¹ &
Moisés Fernández¹

463 Accesses
11 Citations
Explore all metrics

Abstract

Within probabilistic classification problems, learning the Markov boundary of the class variable consists in the optimal approach for feature subset selection. In this paper we propose two algorithms that learn the Markov boundary of a selected variable. These algorithms are based on the score+search paradigm for learning Bayesian networks. Both algorithms use standard scoring functions but they perform the search in constrained spaces of class-focused directed acyclic graphs, going through the space by means of operators adapted for the problem. The algorithms have been validated experimentally by using a wide spectrum of databases, and their results show a performance competitive with the state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on semi-supervised learning

Article Open access 15 November 2019

Learning from positive and unlabeled data: a survey

Article 02 April 2020

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Article 30 August 2016

References

Abramson B, Brown J, Murphy A, Winkler RL (1996) Hailfinder: a Bayesian system for forecasting severe weather. Int J Forecast 12: 57–71
Article Google Scholar
Acid S, de Campos LM (2003) Searching for Bayesian network structures in the space of restricted acyclic partially directed graphs. J Artif Intell Res 18: 445–490
MATH Google Scholar
Acid S, de Campos LM, Castellano JG (2005) Learning Bayesian network classifiers: searching in a space of partially directed acyclic graphs. Mach Learn 59(3): 213–235
MATH Google Scholar
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19: 716–723
Article MathSciNet MATH Google Scholar
Aliferis CF, Tsamardinos I, Statnikov A (2003) HITON, a novel Markov blanket algorithm for optimal variable selection. In: Proceedings of the 2003 American Medical Informatics Association annual symposium, pp 21–25
Alizadeh AA et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403: 503–511
Article Google Scholar
Alon U et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96: 6745–6750
Article Google Scholar
Armstrong SA et al (2002) MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 30: 41–47
Article Google Scholar
Beinlich IA, Suermondt HJ, Chavez RM, Cooper GF (1989) The alarm monitoring system: a case study with two probabilistic inference techniques for belief networks. In: Proceedings of the European conference on artificial intelligence in medicine, pp 247–256
Binder J, Koller D, Russell S, Kanazawa K (1997) Adaptive probabilistic networks with hidden variables. Mach Learn 29: 213–244
Article MATH Google Scholar
Blum A, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1-2): 245–271
Article MathSciNet MATH Google Scholar
Buntine W (1991) Theory refinement of Bayesian networks. In: Proceedings of the seventh conference on uncertainty in artificial intelligence, pp 52–60
Chickering DM (2002) Learning equivalence classes of Bayesian network structures. J Mach Learn Res 2: 445–498
MathSciNet MATH Google Scholar
Chickering DM, Geiger D, Heckerman D (1995) Learning Bayesian networks: search methods and experimental results. In: Preliminary papers of the fifth international workshop on artificial intelligence and statistics, pp 112–128
Cooper GF, Herskovits E (1992) A Bayesian method for the induction of probabilistic networks from data. Mach Learn 9: 309–348
MATH Google Scholar
Cowell R (2001) Conditions under which conditional independence and scoring methods lead to identical selection of Bayesian network models. In: Proceedings of the seventeenth annual conference on uncertainty in artificial intelligence, pp 91–97
de Campos LM (2006) A scoring function for learning Bayesian networks based on mutual information and conditional independence tests. J Mach Learn Res 7: 2149–2187
MathSciNet MATH Google Scholar
de Campos LM, Castellano JG (2007) Bayesian network learning algorithms using structural restrictions. Int J Approx Reason 45(2): 233–254
Article MATH Google Scholar
de Campos LM, Huete JF (2000) A new approach for learning belief networks using independence criteria. Int J Approx Reason 24(1): 11–37
Article MATH Google Scholar
Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3: 185–205
Article Google Scholar
Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29: 131–163
Article MATH Google Scholar
Fu S, Desmarais MC (2007) Local learning algorithm for Markov blanket discovery. Lect Notes Comput Sci 4830: 68–79
Article MathSciNet Google Scholar
Fu S, Desmarais MC (2008a) Tradeoff analysis of different Markov blanket local learning approaches. Lect Notes Artif Intell 5012: 562–571
Google Scholar
Fu S, Desmarais MC (2008b) Fast Markov blanket discovery algorithm via local learning within single pass. Lect Notes Comput Sci 5032: 96–107
Article MathSciNet Google Scholar
Gámez JA, Mateo JL, Puerta JM (2007) A Fast hill-climbing algorithm for Bayesian networks structure learning. Lect Notes Comput Sci 4724: 585–597
Article Google Scholar
Golub TR et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286: 531–537
Article Google Scholar
Gordon GJ et al (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62: 4963–4967
Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3: 1157–1182
MATH Google Scholar
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3): 389–422
Article MATH Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1): 10–18
Article Google Scholar
Heckerman D, Meek C (1997a) Embedded Bayesian network classifiers. Technical Report MSR-TR-97-06, Microsoft Research, Redmond
Heckerman D, Meek C (1997b) Models and selection criteria for regression and classification. In: Proceedings of the thirteenth conference on uncertainty in artificial intelligence, pp 223–228
Heckerman D, Geiger D, Chickering DM (1995) Learning Bayesian networks: The combination of knowledge and statistical data. Mach Learn 20: 197–243
MATH Google Scholar
Inza I, Merino M, Larrañaga P, Quiroga J, Sierra B, Girala M (2001) Feature subset selection by genetic algorithms and estimation of distribution algorithms—a case study in the survival of cirrhotic patients treated with TIPS. Artif Intell Med 23(2): 187–205
Article Google Scholar
Jensen CS (1997) Blocking Gibbs sampling for inference in large and complex Bayesian networks with applications in genetics. PhD Thesis, Aalborg University
John GH, Kohavi R (1994) Irrelevant features and the subset selection problem. In: Proceedings of the eleventh international conference on machine learning, pp 121–129
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1-2): 273–324
Article MATH Google Scholar
Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT, Cambridge
Google Scholar
Koller D, Sahami M (1996) Toward optimal feature selection. In: ICML-96: Proceedings of the thirteenth international conference on machine learning, pp 284–292
Kristensen K, Rasmussen IA (2002) The use of a Bayesian network in the design of a decision support system for growing malting barley without use of pesticides. Comput Electron Agric 33: 197–217
Article Google Scholar
Lam W, Bacchus F (1994) Learning Bayesian belief networks. An approach based on the MDL principle. Comput Intell 10: 269–293
Article Google Scholar
Langley P, Sage S (1994) Induction of selective Bayesian classifiers. In: Proceedings of the tenth conference on uncertainty in artificial intelligence, pp 399–406
Langley P, Iba W, Thompson K (1992) An analysis of Bayesian classifiers. In: Proceedings of the national conference on artificial intelligence, pp 223–228
Li H, Huang H, Sun J, Lin C (2010) On sensitivity of case-based reasoning to optimal feature subsets in business failure prediction. Expert Syst Appl 37(7): 4811–4821
Article Google Scholar
Mamitsuka H (2003) Empirical evaluation of ensemble feature subset selection methods for learning from a high-dimensional database in drug design In: Proceedings of the third IEEE symposium on bioinformatics and bioengineering, pp 253–257
Margaritis D, Thrun S (2000) Bayesian network induction via local neighborhoods. In: Proceedings of neural information processing systems, pp 505–511
Moral S (2004) An empirical comparison of score measures for independence. In: Proceedings of the 10th IPMU international conference, pp 1307–1314
Neapolitan RE (2003) Learning Bayesian Networks. Prentice Hall
Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufman, San Francisco
Google Scholar
Pellet JP, Elisseeff A (2008) Using Markov blankets for causal structure learning. J Mach Learn Res 9: 1295–1342
MathSciNet MATH Google Scholar
Peña JM, Björkegren J, Tegner J (2005) Growing Bayesian network models of gene networks from seed genes. Bioinformatics 21: ii224–ii229
Article Google Scholar
Peña JM, Nilsson R, Björkegren J, Tegner J (2007) Towards scalable and data efficient learning of Markov boundaries. Int J Approx Reason 45(2): 211–232
Article MATH Google Scholar
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8): 1226–1238
Article Google Scholar
Pearl J, Verma T (1990) Equivalence and synthesis of causal models. In: Proceedings of the sixth conference on uncertainty in artificial intelligence, pp 220–227
Pomeroy SL et al (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415: 436–442
Article Google Scholar
Ramsey J (2006) A PC-style Markov blanket search for high dimensional datasets. Technical Report CMU-PHIL-177, Carnegie Mellon University
Rasmussen LK (1995) Bayesian Network for blood typing and parentage verification of cattle. Ph.D. Thesis, Research Centre Foulum, Denmark
Rodriguesde Morais S, Aussem A (2008a) A novel scalable and data efficient feature subset selection algorithm. Lect Notes Comput Sci 5212: 298–312
Article Google Scholar
Rodrigues de Morais S, Aussem A (2008b) A novel scalable and correct Markov boundary learning algorithm under faithfulness condition. In: Proceedings of the 4th European workshop on probabilistic graphical models, pp 81–88
Roos T, Wettig H, Grünwald P, Myllymäki P, Tirri H (2005) On discriminative Bayesian network classifiers and logistic regression. Mach Learn 59: 267–296
MATH Google Scholar
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6: 461–464
Article MATH Google Scholar
Shipp MA et al (2002) Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8(1): 68–74
Article Google Scholar
Sierra B, Larrañaga P (1998) Predicting survival in malignant skin melanoma using Bayesian networks automatically induced by genetic algorithms. An empirical comparison between different approaches. Artif Intell Med 14(1–2): 215–230
Article Google Scholar
Singh M, Provan GM (1996) Efficient learning of selective Bayesian network classifiers. In: Proceedings of the thirteenth international conference on machine learning, pp 453–461
Singh D et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1: 203–209
Article Google Scholar
Song X, Ding Y, Huang J, Ge Y (2010) Feature selection for support vector machine in financial crisis prediction: a case study in China. Expert Syst 27(4): 299–310
Article Google Scholar
Spirtes P, Glymour C, Scheines R (1993) Causation, prediction, and search. Springer, New York
Book MATH Google Scholar
Tsamardinos I, Aliferis CF, Statnikov A (2003a) Algorithms for large scale Markov blanket discovery. In: Proceedings of the sixteenth international Florida artificial intelligence research society conference, pp 376–380
Tsamardinos I, Aliferis CF, Statnikov A (2003b) Time and sample efficient discovery of Markov blankets and direct causal relations. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 673–678
Tsamardinos I, Brown LE, Aliferis CF (2006) The max-min hill-climbing Bayesian network structure learning algorithm. Mach Learn 65: 31–78
Article Google Scholar
Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: Proceedings of the fourteenth international conference on machine learning, pp 412–420
Yaramakala S, Margaritis D (2005) Speculative Markov blanket discovery for optimal feature selection. In: Proceedings of the fifth ieee international conference on data mining, pp 809–812
Yeoh EJ et al (2002) Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1: 133–143
Article Google Scholar
Yishi Z, Hong X, Yang H, Gangyi Q (2009) S-IAMB algorithm for Markov blanket discovery. In: Proceedings of the Asia-Pacific conference on information processing, vol 2, pp 379–382
Zheng H, Zhang Y (2008) Feature selection for high-dimensional data in astronomy. Adv Space Res 41(12): 1960–1964
Article Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Ciencias de la Computación e Inteligencia Artificial, E.T.S.I. Informática y de Telecomunicación, CITIC-UGR, Universidad de Granada, 18071, Granada, Spain
Silvia Acid, Luis M. de Campos & Moisés Fernández

Authors

Silvia Acid
View author publications
You can also search for this author in PubMed Google Scholar
Luis M. de Campos
View author publications
You can also search for this author in PubMed Google Scholar
Moisés Fernández
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luis M. de Campos.

Additional information

Responsible editor: Charles Elkan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Acid, S., de Campos, L.M. & Fernández, M. Score-based methods for learning Markov boundaries by searching in constrained spaces. Data Min Knowl Disc 26, 174–212 (2013). https://doi.org/10.1007/s10618-011-0247-5

Download citation

Received: 12 May 2010
Accepted: 21 November 2011
Published: 06 December 2011
Issue Date: January 2013
DOI: https://doi.org/10.1007/s10618-011-0247-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Score-based methods for learning Markov boundaries by searching in constrained spaces

Abstract

Access this article

Similar content being viewed by others

A survey on semi-supervised learning

Learning from positive and unlabeled data: a survey

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Score-based methods for learning Markov boundaries by searching in constrained spaces

Abstract

Access this article

Similar content being viewed by others

A survey on semi-supervised learning

Learning from positive and unlabeled data: a survey

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation