Abstract
In the last years, there has been a large growth in gene expression profiling technologies, which are expected to provide insight into cancer related cellular processes. Machine Learning algorithms, which are extensively applied in many areas of the real world, are not still popular in the Bioinformatics community. We report on the successful application of the combination of two supervised Machine Learning methods, Bayesian Networks and k Nearest Neighbours algorithms, to cancer class prediction problems in three DNA microarray datasets of huge dimensionality (Colon, Leukemia and NCI-60). The essential gene selection process in microarray domains is performed by a sequential search engine and after used for the Bayesian Network model learning. Once the genes are selected for the Bayesian Network paradigm, we combine this paradigm with the well known K NN algorithm in order to improve the classification accuracy.
This work was supported the University of the Basque Country under UPV 140.226-EA186/96 grant and by the Gipuzkoako Foru Aldundi Txit Gorena under OF761/2003 grant .
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., Yakhini, Z.: Tissue Classification with Gene Expression Profiles. Journal of Computational Biology 7(3-4), 559–584 (2000)
Blanco, R., Larrañaga, P., Inza, I., Sierra, B.: Gene selection for cancer classification using wrapper approaches. International Journal of Pattern Recognition and Artificial Intelligence (2004)
Chickering, D.M.: Optimal structure identification with greedy search. Journal of Machine Learning Research 3, 507–554 (2002)
Cooper, G.F., Herskovits, E.: A bayesian method for induction of probabilistic networks from data. In: Machine Learning, Boston, vol. 9, pp. 309–347. Kluwer Academic PUBLISHERs, Boston (1992)
Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. IT-13 1, 21–27 (1967)
Doak, J.: An evaluation of feature selection methods and their application to computer security. Technical Report CSE-92-18, University of California at Davis (1992)
Friedman, N., Goldszmidt, M.: Building classifiers using bayesian networks. AAAI/IAAI 2, 1277–1284 (1996)
Friedman, N., Koller, D.: Being bayesian about network structure. a bayesian approach to structure discovery in bayesian networks. Machine Learning 50, 95–125 (2003)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caliguri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999)
Inza, I., Larrañaga, P., Etxeberria, R., Sierra, B.: Feature Subset Selection by Bayesian network-based optimization. Artificial Intelligence 123(1-2), 157–184 (2000)
Inza, I., Sierra, B., Blanco, R., naga, P.L.: Gene selection by sequential search wrapper approaches in microarray cancer class prediction. JOURNAL of Intelligent and Fuzzy Systems (2002) (accepted.)
Jensen, F.V.: Bayesian Networks and Decision Graphs (Statistics for Engineering and Information Science). Springer, Heidelberg (2001)
Kittler, J.: Feature set search algorithms. In: Chen, C.H. (ed.) Pattern Recognition and Signal Processing, Sithoff and Noordhoff, pp. 41–60 (1978)
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Lavrac, N., Wrobel, S. (eds.) Proceedings of the International Joint Conference on Artificial Intelligence (1995)
Kohavi, R., John, G.: Wrappers for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)
Kohavi, R., Sommerfield, D., Dougherty, J.: Data mining using MLC++, a Machine Learning library in C++. International Journal of Artificial Intelligence Tools 6, 537–566 (1997)
Lazkano, E., Sierra, B.: Bayes-nearest:a new hybrid classifier combining bayesian network and distance based algorithms. In: Pires, F.M., Abreu, S.P. (eds.) EPIA 2003. LNCS (LNAI), vol. 2902, pp. 171–183. Springer, Heidelberg (2003)
Li, L., Pedersen, L.G., Darden, T.A., Weinberg, C.: Computational Analysis of Leukemia Microarray Expression Data Using the GA/KNN Method. In: Proceedings of the First Conference on Critical Assessment of Microarray Data Analysis, CAMDA 2000 (2000)
Li, W., Yang, Y.: How many genes are needed for a discriminant microarray data analysis? In: Proceedings of the First Conference on Critical Assessment of Microarray Data Analysis, CAMDA 2000 (2000)
Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, Dordrecht (1998)
Mitchell, T.M.: Machine Learning. McGraw Hill, New York (1997)
Pearl, J.: Evidential reasoning using stochastic simulation of causal models. Artificial Intelligence 32(2), 247–257 (1987)
Pudil, P., Novovicova, J., Kittler, J.: Floating search methods in feature selection. Pattern Recognition Letters 15(1), 1119–1125 (1994)
Romero, D., Larrañaga, P., Sierra, B.: Learning bayesian networks on the space of orderings with estimation of distribution algorithms. International Journal on Pattern Recognition and Artificial Intelligence 18(4), 45–60 (2004)
Ross, D.T., Scherf, U., Eisen, M.B., Perou, C.M., Rees, C., Spellman, P., Iyer, V., Jeffrey, S.S., Van de Rijn, M., Waltham, M., Pergamenschikov, A., Lee, J.C.F., Lashkari, D., Shalon, D., Myers, T.G., Weinstein, J.N., Botstein, D., Brown, P.O.: Systematic variation in gene expression patterns in human cancer cell lines. Nature Genetics 24(3), 227–234 (2000)
Schwarz, G.: Estimating the dimension of a model. Annals of Statistics 6(2), 461–464 (1978)
Shannon, C.E.: A mathematical theory of communication. The Bell System Technical Journal 27, 379–423 (1948)
Sierra, B., Larrañaga, P.: Predicting survival in malignant skin melanoma using bayesian networks automatically induced by genetic algorithms. An empirical comparison between different approaches. Artificial Intelligence in Medicine 14, 215–230 (1998)
Sierra, B., Serrano, N., Larrañaga, P., Plasencia, E.J., Inza, I., Jiménez, J.J., Revuelta, P., Mora, M.L.: Using bayesian networks in the construction of a bi-level multi-classifier. In: Artificial Intelligence in Medicine, vol. 22, pp. 233–248 (2001)
Xing, E.P., Jordan, M.I., Karp, R.M.: Feature Selection for High-Dimensional Genomic Microarray Data. In: Proceedings of the Eighteenth International Conference in Machine Learning, ICML2001, pp. 601–608 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sierra, B., Lazkano, E., Martínez-Otzeta, J.M., Astigarraga, A. (2004). Combining Bayesian Networks, k Nearest Neighbours Algorithm and Attribute Selection for Gene Expression Data Analysis. In: Webb, G.I., Yu, X. (eds) AI 2004: Advances in Artificial Intelligence. AI 2004. Lecture Notes in Computer Science(), vol 3339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30549-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-30549-1_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24059-4
Online ISBN: 978-3-540-30549-1
eBook Packages: Computer ScienceComputer Science (R0)