Robust Gene Selection from Microarray Data with a Novel Markov Boundary Learning Method: Application to Diabetes Analysis

Aussem, Alex; de Morais, Sergio Rodrigues; Perraud, Florence; Rome, Sophie

doi:10.1007/978-3-642-02906-6_62

Alex Aussem²¹,
Sergio Rodrigues de Morais²¹,
Florence Perraud²¹ &
…
Sophie Rome²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5590))

Included in the following conference series:

European Conference on Symbolic and Quantitative Approaches to Reasoning and Uncertainty

1261 Accesses
3 Citations

Abstract

This paper discusses the application of a novel feature subset selection method in high-dimensional genomic microarray data on type 2 diabetes based on recent Bayesian network learning techniques. We report experiments on a database that consists of 22,283 genes and only 143 patients. The method searches the genes that are conjunctly the most associated to the diabetes status. This is achieved in the context of learning the Markov boundary of the class variable. Since the selected genes are subsequently analyzed further by biologists, requiring much time and effort, not only model performance but also robustness of the gene selection process is crucial. Therefore, we assess the variability of our results and propose an ensemble technique to yield more robust results. Our findings are compared with the genes that were associated with an increased risk of diabetes in the recent medical literature. The main outcomes of the present research are an improved understanding of the pathophysiology of obesity, and a clear appreciation of the applicability and limitations of Markov boundary learning techniques to human gene expression data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Nilsson, R., Peña, J.M., Bjrkegren, J., Tegnr, J.: Consistent feature selection for pattern recognition in polynomial time. Journal of Machine Learning Research 8, 589–612 (2007)
Google Scholar
Peña, J.M., Nilsson, R., Bjrkegren, J., Tegnr, J.: Towards scalable and data eficient learning of Markov boundaries. International Journal of Approximate Reasoning 45(2), 211–232 (2007)
Article Google Scholar
Saeys, Y., Abeel, T., Van de Peer, Y.: Robust feature selection using ensemble feature selection techniques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part I. LNCS, vol. 5211, pp. 313–325. Springer, Heidelberg (2008)
Chapter Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
MATH Google Scholar
Tsamardinos, I., Brown, L.E., Aliferis, C.F.: The max-min hill-climbing Bayesian network structure learning algorithm. Machine Learning 65(1), 31–78 (2006)
Article Google Scholar
Peña, J.M., Björkegren, J., Tegnér, J.: Scalable, efficient and correct learning of Markov boundaries under the faithfulness assumption. In: Godo, L. (ed.) ECSQARU 2005. LNCS, vol. 3571, pp. 136–147. Springer, Heidelberg (2005)
Chapter Google Scholar
Koller, D., Sahami, M.: Toward optimal feature selection. In: ICML, pp. 284–292 (1996)
Google Scholar
Rodrigues de Morais, S., Aussem, A.: A novel scalable and data efficient feature subset selection algorithm. In: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases ECML-PKDD 2008, Antwerp, Belgium, pp. 298–312 (2008)
Google Scholar
Rodrigues de Morais, S., Aussem, A.: A novel scalable and correct Markov boundary learning algorithms under faithfulness condition. In: 4th European Workshop on Probabilistic Graphical Models PGM 2008, Hirtshals, Denmark, pp. 81–88 (2008)
Google Scholar
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco (1988)
MATH Google Scholar
Neapolitan, R.E.: Learning Bayesian Networks. Prentice-Hall, Englewood Cliffs (2004)
Google Scholar
Chickering, D.M., Heckerman, D., Meek, C.: Large-sample learning of Bayesian networks is NP-hard. Journal of Machine Learning Research 5, 1287–1330 (2004)
MATH Google Scholar
Tsamardinos, I., Aliferis, C.F., Statnikov, A.R.: Algorithms for large scale Markov blanket discovery. In: Florida Artificial Intelligence Research Society Conference FLAIRS 2003, pp. 376–381 (2003)
Google Scholar
Yaramakala, S.: Fast Markov blanket discovery. In: MS-Thesis, Iowa State University (2004)
Google Scholar
Yaramakala, S., Margaritis, D.: Speculative Markov blanket discovery for optimal feature selection. In: IEEE International Conference on Data Mining, pp. 809–812 (2005)
Google Scholar
Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl. Inf. Syst. 12, 95–116 (2007)
Article Google Scholar
Kononenko, I.: Estimating attributes: Analysis and extensions of relief. In: European Conference on Machine Learning, pp. 171–182 (1984)
Google Scholar
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)
Article MATH Google Scholar
Kurgan, L.A., Cios, K.J.: Caim discretization algorithm. IEEE Trans. on Knowl. and Data Eng. 16(2), 145–153 (2004)
Article Google Scholar
Lai, C.Q., et al.: PPARGC1A variation associated with DNA damage, diabetes, and cardiovascular diseases: the Boston Puerto Rican health study. diabetes. Diabetes 57, 809–816 (2008)
Article Google Scholar
Zeggini, E., et al.: Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat. Genet. 40, 638–645 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Lyon, LIESP, Université de Lyon 1, F-69622, Villeurbanne, France
Alex Aussem, Sergio Rodrigues de Morais & Florence Perraud
University of Lyon, RMND INSERM U870/INRA 1235, F-69622, Villeurbanne, France
Sophie Rome

Authors

Alex Aussem
View author publications
You can also search for this author in PubMed Google Scholar
Sergio Rodrigues de Morais
View author publications
You can also search for this author in PubMed Google Scholar
Florence Perraud
View author publications
You can also search for this author in PubMed Google Scholar
Sophie Rome
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ISIB-CNR, Corso Stati Uniti 4, 35127, Padova, Italy
Claudio Sossai
ISIB-CNR, Corso Stati Uniti, 4, 35127, Padova, Italy
Gaetano Chemello

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aussem, A., de Morais, S.R., Perraud, F., Rome, S. (2009). Robust Gene Selection from Microarray Data with a Novel Markov Boundary Learning Method: Application to Diabetes Analysis. In: Sossai, C., Chemello, G. (eds) Symbolic and Quantitative Approaches to Reasoning with Uncertainty. ECSQARU 2009. Lecture Notes in Computer Science(), vol 5590. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02906-6_62

Download citation

DOI: https://doi.org/10.1007/978-3-642-02906-6_62
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02905-9
Online ISBN: 978-3-642-02906-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics