Abstract
This paper discusses the application of a novel feature subset selection method in high-dimensional genomic microarray data on type 2 diabetes based on recent Bayesian network learning techniques. We report experiments on a database that consists of 22,283 genes and only 143 patients. The method searches the genes that are conjunctly the most associated to the diabetes status. This is achieved in the context of learning the Markov boundary of the class variable. Since the selected genes are subsequently analyzed further by biologists, requiring much time and effort, not only model performance but also robustness of the gene selection process is crucial. Therefore, we assess the variability of our results and propose an ensemble technique to yield more robust results. Our findings are compared with the genes that were associated with an increased risk of diabetes in the recent medical literature. The main outcomes of the present research are an improved understanding of the pathophysiology of obesity, and a clear appreciation of the applicability and limitations of Markov boundary learning techniques to human gene expression data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Nilsson, R., Peña, J.M., Bjrkegren, J., Tegnr, J.: Consistent feature selection for pattern recognition in polynomial time. Journal of Machine Learning Research 8, 589–612 (2007)
Peña, J.M., Nilsson, R., Bjrkegren, J., Tegnr, J.: Towards scalable and data eficient learning of Markov boundaries. International Journal of Approximate Reasoning 45(2), 211–232 (2007)
Saeys, Y., Abeel, T., Van de Peer, Y.: Robust feature selection using ensemble feature selection techniques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part I. LNCS, vol. 5211, pp. 313–325. Springer, Heidelberg (2008)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Tsamardinos, I., Brown, L.E., Aliferis, C.F.: The max-min hill-climbing Bayesian network structure learning algorithm. Machine Learning 65(1), 31–78 (2006)
Peña, J.M., Björkegren, J., Tegnér, J.: Scalable, efficient and correct learning of Markov boundaries under the faithfulness assumption. In: Godo, L. (ed.) ECSQARU 2005. LNCS, vol. 3571, pp. 136–147. Springer, Heidelberg (2005)
Koller, D., Sahami, M.: Toward optimal feature selection. In: ICML, pp. 284–292 (1996)
Rodrigues de Morais, S., Aussem, A.: A novel scalable and data efficient feature subset selection algorithm. In: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases ECML-PKDD 2008, Antwerp, Belgium, pp. 298–312 (2008)
Rodrigues de Morais, S., Aussem, A.: A novel scalable and correct Markov boundary learning algorithms under faithfulness condition. In: 4th European Workshop on Probabilistic Graphical Models PGM 2008, Hirtshals, Denmark, pp. 81–88 (2008)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco (1988)
Neapolitan, R.E.: Learning Bayesian Networks. Prentice-Hall, Englewood Cliffs (2004)
Chickering, D.M., Heckerman, D., Meek, C.: Large-sample learning of Bayesian networks is NP-hard. Journal of Machine Learning Research 5, 1287–1330 (2004)
Tsamardinos, I., Aliferis, C.F., Statnikov, A.R.: Algorithms for large scale Markov blanket discovery. In: Florida Artificial Intelligence Research Society Conference FLAIRS 2003, pp. 376–381 (2003)
Yaramakala, S.: Fast Markov blanket discovery. In: MS-Thesis, Iowa State University (2004)
Yaramakala, S., Margaritis, D.: Speculative Markov blanket discovery for optimal feature selection. In: IEEE International Conference on Data Mining, pp. 809–812 (2005)
Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl. Inf. Syst. 12, 95–116 (2007)
Kononenko, I.: Estimating attributes: Analysis and extensions of relief. In: European Conference on Machine Learning, pp. 171–182 (1984)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)
Kurgan, L.A., Cios, K.J.: Caim discretization algorithm. IEEE Trans. on Knowl. and Data Eng. 16(2), 145–153 (2004)
Lai, C.Q., et al.: PPARGC1A variation associated with DNA damage, diabetes, and cardiovascular diseases: the Boston Puerto Rican health study. diabetes. Diabetes 57, 809–816 (2008)
Zeggini, E., et al.: Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat. Genet. 40, 638–645 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Aussem, A., de Morais, S.R., Perraud, F., Rome, S. (2009). Robust Gene Selection from Microarray Data with a Novel Markov Boundary Learning Method: Application to Diabetes Analysis. In: Sossai, C., Chemello, G. (eds) Symbolic and Quantitative Approaches to Reasoning with Uncertainty. ECSQARU 2009. Lecture Notes in Computer Science(), vol 5590. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02906-6_62
Download citation
DOI: https://doi.org/10.1007/978-3-642-02906-6_62
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02905-9
Online ISBN: 978-3-642-02906-6
eBook Packages: Computer ScienceComputer Science (R0)