Abstract
Machine learning techniques have been recognized as powerful tools for the analysis of gene expression data. However, most learning techniques used in class prediction in gene expression analysis during the past years generate black-box models. Although the prediction accuracy of these models could be very well, they provide little insight into the biological facts. This paper holds the recognition that a more reasonable role for machine learning techniques is to generate hypotheses that can be verified or refined by human experts instead of making decisions for human experts. Based on this recognition, a general approach to generate comprehensible hypotheses from gene expression data is described and applied to human acute leukemias as a test case. The results demonstrate the feasibility of using machine learning techniques to help form hypotheses on the relationship between genes and certain diseases.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Albrecht, A., Vinterbo, S.A., Ohno-Machado, L.: An epicurean learning approach to gene-expression data classification. Artificial Intelligence in Medicine 28, 75–87 (2003)
Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., Yakhini, Z.: Tissue classification with gene expression profiles. Journal of Computational Biology 7, 559–584 (2000)
Bishop, J.F.: Adult acute myeloid leukaemia: update on treatment. Medical Journal of Australia 170, 39–43 (1999)
Cho, S.-B., Ryu, J.: Classifying gene expression data of cancer using classifier ensemble with mutually exclusive features. Proceedings of the IEEE 90, 1744–1753 (2002)
Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association 97, 77–87 (2002)
Efron, B., Tibshirani, R.: An Introduction to the Bootstrap. Chapman & Hall, New York (1993)
Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Haussler, D.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16, 906–914 (2000)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)
Hayashi, Y., Setiono, R., Yoshida, K.: A comparison between two neural network rule extraction techniques for the diagnosis of hepatobiliary disorders. Artificial Intelligence in Medicine 20, 205–216 (2000)
Khan, J., Wei, J.S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C., Meltzer, P.S.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine 7, 673–679 (2001)
Li, J., Wong, L.: Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns. Bioinformatics 18, 725–734 (2002)
Li, W., Yang, Y.: How many genes are needed for a discriminant microarray data analysis. In: Lin, S.M., Johnson, K.F. (eds.) Methods of Microarray Data Analysis, pp. 137–150. Kluwer, Boston (2001)
Maughan, N.J., Lewis, F.A., Smith, V.: An introduction to arrays. Journal of Pathology 195, 3–6 (2001)
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
Mjolsness, E., DeCoste, D.: Machine learning for science: state of the art and future prospects. Science 293, 2051–2055 (2001)
Nguyen, D.V., Rocke, D.M.: Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 18, 39–50 (2002)
Pui, C.H., Evans, W.E.: Acute lymphoblastic leukemia. New England Journal of Medicine 339, 605–615 (1998)
Quackenbush, J.: Computational analysis of microarray data. Nature Reviews Genetics 2, 418–427 (2001)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Setiono, R.: Generating concise and accurate classification rules for breast cancer diagnosis. Artificial Intelligence in Medicine 18, 205–219 (2000)
Tan, A.C., Gilbert, D.: Ensemble machine learning on gene expression data for cancer classification. Applied Bioinformatics 2, S75–S83 (2003)
Yun, Z., Keong, K.C.: Identifying simple discriminatory gene vectors with an information theory approach. In: Proceedings of the 4th IEEE Computational Systems Bioinformatics Conference, Stanford, CA, pp. 13–24 (2005)
Zhou, Z.-H.: Rule extraction: using neural networks or for neural networks? Journal of Computer Science & Technology 19, 249–253 (2004)
Zhou, Z.-H., Jiang, Y.: Medical diagnosis with C4.5 rule preceded by artificial neural network ensemble. IEEE Transactions on Information Technology in Biomedicine 7, 37–42 (2003)
Zhou, Z.-H., Jiang, Y.: NeC4.5: neural ensemble based C4.5. IEEE Transactions on Knowledge and Data Engineering 16, 770–773 (2004)
Zhou, Z.-H., Wu, J., Tang, W.: Ensembling neural networks: many could be better than all. Artificial Intelligence 137, 239–263 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jiang, Y., Li, M., Zhou, ZH. (2006). Generation of Comprehensible Hypotheses from Gene Expression Data. In: Li, J., Yang, Q., Tan, AH. (eds) Data Mining for Biomedical Applications. BioDM 2006. Lecture Notes in Computer Science(), vol 3916. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11691730_12
Download citation
DOI: https://doi.org/10.1007/11691730_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33104-9
Online ISBN: 978-3-540-33105-6
eBook Packages: Computer ScienceComputer Science (R0)