Skip to main content

Combining Bayesian Networks, k Nearest Neighbours Algorithm and Attribute Selection for Gene Expression Data Analysis

  • Conference paper
Book cover AI 2004: Advances in Artificial Intelligence (AI 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3339))

Included in the following conference series:

Abstract

In the last years, there has been a large growth in gene expression profiling technologies, which are expected to provide insight into cancer related cellular processes. Machine Learning algorithms, which are extensively applied in many areas of the real world, are not still popular in the Bioinformatics community. We report on the successful application of the combination of two supervised Machine Learning methods, Bayesian Networks and k Nearest Neighbours algorithms, to cancer class prediction problems in three DNA microarray datasets of huge dimensionality (Colon, Leukemia and NCI-60). The essential gene selection process in microarray domains is performed by a sequential search engine and after used for the Bayesian Network model learning. Once the genes are selected for the Bayesian Network paradigm, we combine this paradigm with the well known K NN algorithm in order to improve the classification accuracy.

This work was supported the University of the Basque Country under UPV 140.226-EA186/96 grant and by the Gipuzkoako Foru Aldundi Txit Gorena under OF761/2003 grant .

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., Yakhini, Z.: Tissue Classification with Gene Expression Profiles. Journal of Computational Biology 7(3-4), 559–584 (2000)

    Article  Google Scholar 

  2. Blanco, R., Larrañaga, P., Inza, I., Sierra, B.: Gene selection for cancer classification using wrapper approaches. International Journal of Pattern Recognition and Artificial Intelligence (2004)

    Google Scholar 

  3. Chickering, D.M.: Optimal structure identification with greedy search. Journal of Machine Learning Research 3, 507–554 (2002)

    Article  MathSciNet  Google Scholar 

  4. Cooper, G.F., Herskovits, E.: A bayesian method for induction of probabilistic networks from data. In: Machine Learning, Boston, vol. 9, pp. 309–347. Kluwer Academic PUBLISHERs, Boston (1992)

    Google Scholar 

  5. Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. IT-13 1, 21–27 (1967)

    Article  Google Scholar 

  6. Doak, J.: An evaluation of feature selection methods and their application to computer security. Technical Report CSE-92-18, University of California at Davis (1992)

    Google Scholar 

  7. Friedman, N., Goldszmidt, M.: Building classifiers using bayesian networks. AAAI/IAAI 2, 1277–1284 (1996)

    Google Scholar 

  8. Friedman, N., Koller, D.: Being bayesian about network structure. a bayesian approach to structure discovery in bayesian networks. Machine Learning 50, 95–125 (2003)

    Article  MATH  Google Scholar 

  9. Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caliguri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999)

    Article  Google Scholar 

  10. Inza, I., Larrañaga, P., Etxeberria, R., Sierra, B.: Feature Subset Selection by Bayesian network-based optimization. Artificial Intelligence 123(1-2), 157–184 (2000)

    Article  MATH  Google Scholar 

  11. Inza, I., Sierra, B., Blanco, R., naga, P.L.: Gene selection by sequential search wrapper approaches in microarray cancer class prediction. JOURNAL of Intelligent and Fuzzy Systems (2002) (accepted.)

    Google Scholar 

  12. Jensen, F.V.: Bayesian Networks and Decision Graphs (Statistics for Engineering and Information Science). Springer, Heidelberg (2001)

    Google Scholar 

  13. Kittler, J.: Feature set search algorithms. In: Chen, C.H. (ed.) Pattern Recognition and Signal Processing, Sithoff and Noordhoff, pp. 41–60 (1978)

    Google Scholar 

  14. Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Lavrac, N., Wrobel, S. (eds.) Proceedings of the International Joint Conference on Artificial Intelligence (1995)

    Google Scholar 

  15. Kohavi, R., John, G.: Wrappers for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)

    Article  MATH  Google Scholar 

  16. Kohavi, R., Sommerfield, D., Dougherty, J.: Data mining using MLC++, a Machine Learning library in C++. International Journal of Artificial Intelligence Tools 6, 537–566 (1997)

    Article  Google Scholar 

  17. Lazkano, E., Sierra, B.: Bayes-nearest:a new hybrid classifier combining bayesian network and distance based algorithms. In: Pires, F.M., Abreu, S.P. (eds.) EPIA 2003. LNCS (LNAI), vol. 2902, pp. 171–183. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  18. Li, L., Pedersen, L.G., Darden, T.A., Weinberg, C.: Computational Analysis of Leukemia Microarray Expression Data Using the GA/KNN Method. In: Proceedings of the First Conference on Critical Assessment of Microarray Data Analysis, CAMDA 2000 (2000)

    Google Scholar 

  19. Li, W., Yang, Y.: How many genes are needed for a discriminant microarray data analysis? In: Proceedings of the First Conference on Critical Assessment of Microarray Data Analysis, CAMDA 2000 (2000)

    Google Scholar 

  20. Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, Dordrecht (1998)

    MATH  Google Scholar 

  21. Mitchell, T.M.: Machine Learning. McGraw Hill, New York (1997)

    MATH  Google Scholar 

  22. Pearl, J.: Evidential reasoning using stochastic simulation of causal models. Artificial Intelligence 32(2), 247–257 (1987)

    Article  MathSciNet  Google Scholar 

  23. Pudil, P., Novovicova, J., Kittler, J.: Floating search methods in feature selection. Pattern Recognition Letters 15(1), 1119–1125 (1994)

    Article  Google Scholar 

  24. Romero, D., Larrañaga, P., Sierra, B.: Learning bayesian networks on the space of orderings with estimation of distribution algorithms. International Journal on Pattern Recognition and Artificial Intelligence 18(4), 45–60 (2004)

    Google Scholar 

  25. Ross, D.T., Scherf, U., Eisen, M.B., Perou, C.M., Rees, C., Spellman, P., Iyer, V., Jeffrey, S.S., Van de Rijn, M., Waltham, M., Pergamenschikov, A., Lee, J.C.F., Lashkari, D., Shalon, D., Myers, T.G., Weinstein, J.N., Botstein, D., Brown, P.O.: Systematic variation in gene expression patterns in human cancer cell lines. Nature Genetics 24(3), 227–234 (2000)

    Article  Google Scholar 

  26. Schwarz, G.: Estimating the dimension of a model. Annals of Statistics 6(2), 461–464 (1978)

    Article  MATH  MathSciNet  Google Scholar 

  27. Shannon, C.E.: A mathematical theory of communication. The Bell System Technical Journal 27, 379–423 (1948)

    MATH  MathSciNet  Google Scholar 

  28. Sierra, B., Larrañaga, P.: Predicting survival in malignant skin melanoma using bayesian networks automatically induced by genetic algorithms. An empirical comparison between different approaches. Artificial Intelligence in Medicine 14, 215–230 (1998)

    Article  Google Scholar 

  29. Sierra, B., Serrano, N., Larrañaga, P., Plasencia, E.J., Inza, I., Jiménez, J.J., Revuelta, P., Mora, M.L.: Using bayesian networks in the construction of a bi-level multi-classifier. In: Artificial Intelligence in Medicine, vol. 22, pp. 233–248 (2001)

    Google Scholar 

  30. Xing, E.P., Jordan, M.I., Karp, R.M.: Feature Selection for High-Dimensional Genomic Microarray Data. In: Proceedings of the Eighteenth International Conference in Machine Learning, ICML2001, pp. 601–608 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sierra, B., Lazkano, E., Martínez-Otzeta, J.M., Astigarraga, A. (2004). Combining Bayesian Networks, k Nearest Neighbours Algorithm and Attribute Selection for Gene Expression Data Analysis. In: Webb, G.I., Yu, X. (eds) AI 2004: Advances in Artificial Intelligence. AI 2004. Lecture Notes in Computer Science(), vol 3339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30549-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30549-1_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24059-4

  • Online ISBN: 978-3-540-30549-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics