Skip to main content

Robust Gene Selection from Microarray Data with a Novel Markov Boundary Learning Method: Application to Diabetes Analysis

  • Conference paper
Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5590))

Abstract

This paper discusses the application of a novel feature subset selection method in high-dimensional genomic microarray data on type 2 diabetes based on recent Bayesian network learning techniques. We report experiments on a database that consists of 22,283 genes and only 143 patients. The method searches the genes that are conjunctly the most associated to the diabetes status. This is achieved in the context of learning the Markov boundary of the class variable. Since the selected genes are subsequently analyzed further by biologists, requiring much time and effort, not only model performance but also robustness of the gene selection process is crucial. Therefore, we assess the variability of our results and propose an ensemble technique to yield more robust results. Our findings are compared with the genes that were associated with an increased risk of diabetes in the recent medical literature. The main outcomes of the present research are an improved understanding of the pathophysiology of obesity, and a clear appreciation of the applicability and limitations of Markov boundary learning techniques to human gene expression data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Nilsson, R., Peña, J.M., Bjrkegren, J., Tegnr, J.: Consistent feature selection for pattern recognition in polynomial time. Journal of Machine Learning Research 8, 589–612 (2007)

    Google Scholar 

  2. Peña, J.M., Nilsson, R., Bjrkegren, J., Tegnr, J.: Towards scalable and data eficient learning of Markov boundaries. International Journal of Approximate Reasoning 45(2), 211–232 (2007)

    Article  Google Scholar 

  3. Saeys, Y., Abeel, T., Van de Peer, Y.: Robust feature selection using ensemble feature selection techniques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part I. LNCS, vol. 5211, pp. 313–325. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  4. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)

    MATH  Google Scholar 

  5. Tsamardinos, I., Brown, L.E., Aliferis, C.F.: The max-min hill-climbing Bayesian network structure learning algorithm. Machine Learning 65(1), 31–78 (2006)

    Article  Google Scholar 

  6. Peña, J.M., Björkegren, J., Tegnér, J.: Scalable, efficient and correct learning of Markov boundaries under the faithfulness assumption. In: Godo, L. (ed.) ECSQARU 2005. LNCS, vol. 3571, pp. 136–147. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  7. Koller, D., Sahami, M.: Toward optimal feature selection. In: ICML, pp. 284–292 (1996)

    Google Scholar 

  8. Rodrigues de Morais, S., Aussem, A.: A novel scalable and data efficient feature subset selection algorithm. In: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases ECML-PKDD 2008, Antwerp, Belgium, pp. 298–312 (2008)

    Google Scholar 

  9. Rodrigues de Morais, S., Aussem, A.: A novel scalable and correct Markov boundary learning algorithms under faithfulness condition. In: 4th European Workshop on Probabilistic Graphical Models PGM 2008, Hirtshals, Denmark, pp. 81–88 (2008)

    Google Scholar 

  10. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco (1988)

    MATH  Google Scholar 

  11. Neapolitan, R.E.: Learning Bayesian Networks. Prentice-Hall, Englewood Cliffs (2004)

    Google Scholar 

  12. Chickering, D.M., Heckerman, D., Meek, C.: Large-sample learning of Bayesian networks is NP-hard. Journal of Machine Learning Research 5, 1287–1330 (2004)

    MATH  Google Scholar 

  13. Tsamardinos, I., Aliferis, C.F., Statnikov, A.R.: Algorithms for large scale Markov blanket discovery. In: Florida Artificial Intelligence Research Society Conference FLAIRS 2003, pp. 376–381 (2003)

    Google Scholar 

  14. Yaramakala, S.: Fast Markov blanket discovery. In: MS-Thesis, Iowa State University (2004)

    Google Scholar 

  15. Yaramakala, S., Margaritis, D.: Speculative Markov blanket discovery for optimal feature selection. In: IEEE International Conference on Data Mining, pp. 809–812 (2005)

    Google Scholar 

  16. Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl. Inf. Syst. 12, 95–116 (2007)

    Article  Google Scholar 

  17. Kononenko, I.: Estimating attributes: Analysis and extensions of relief. In: European Conference on Machine Learning, pp. 171–182 (1984)

    Google Scholar 

  18. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)

    Article  MATH  Google Scholar 

  19. Kurgan, L.A., Cios, K.J.: Caim discretization algorithm. IEEE Trans. on Knowl. and Data Eng. 16(2), 145–153 (2004)

    Article  Google Scholar 

  20. Lai, C.Q., et al.: PPARGC1A variation associated with DNA damage, diabetes, and cardiovascular diseases: the Boston Puerto Rican health study. diabetes. Diabetes 57, 809–816 (2008)

    Article  Google Scholar 

  21. Zeggini, E., et al.: Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat. Genet. 40, 638–645 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Aussem, A., de Morais, S.R., Perraud, F., Rome, S. (2009). Robust Gene Selection from Microarray Data with a Novel Markov Boundary Learning Method: Application to Diabetes Analysis. In: Sossai, C., Chemello, G. (eds) Symbolic and Quantitative Approaches to Reasoning with Uncertainty. ECSQARU 2009. Lecture Notes in Computer Science(), vol 5590. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02906-6_62

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02906-6_62

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02905-9

  • Online ISBN: 978-3-642-02906-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics