Abstract
We present our results on the prediction of leukemia from microarray data. Our methodology was based on data mining (rule induction) using rough set theory. We used a novel methodology based on rule generations and cumulative rule sets. The final rule set contained only eight rules, using some combinations of eight genes. All cases from the training data set and all but one cases from the testing data set were correctly classified. Moreover, six out of eight genes found by us are well known in the literature as relevant to leukemia.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Booker, L.B., Goldberg, D.E., Holland, J.F.: Classifier systems and genetic algorithms. In: Carbonell, J.G. (ed.) Machine Learning. Paradigms and Methods, pp. 235–282. The MIT Press, Menlo Park (1990)
Broberg, P.: Statistical methods for ranking differentially expressed genes. Genome Biology 4 (2003), http://genomebiology.com
Chu, W., Ghahramani, Z., Falciani, F., Wild, D.L.: Biomarker discovery in microarray gene expression data with Gaussian processes. Bioinformatics 21, 3385–3393 (2005)
Cohen, A.J., Franklin, W.A., Magill, C., Sorenson, J., Miller, Y.E.: Low neutral endopeptidase levels in bronchoalveolar lavage fluid of lung cancer patients. American Journal of Respiratory and Critical Care Medicine 159, 907–910 (1999)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Grzymala-Busse, J.W.: Knowledge acquisition under uncertainty—A rough set approach. Journal of Intelligent & Robotic Systems 1, 3–16 (1988)
Grzymala-Busse, J.W.: LERS—A system for learning from examples based on rough sets. In: Slowinski, R. (ed.) Intelligent Decision Support. Handbook of Applications and Advances of the Rough Sets Theory, pp. 3–18. Kluwer Academic Publishers, Dordrecht (1992)
Grzymala-Busse, J.W.: A new version of the rule induction system LERS. Fundamenta Informaticae 31, 27–39 (1997)
Grzymala-Busse, J.W.: MLEM2: A new algorithm for rule induction from imperfect data. In: Proceedings of the 9th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU 2002, July 1–5, 2002, Annecy, France, pp. 243–250 (2002)
Grzymala-Busse, J.W., Goodwin, L.K., Grzymala-Busse, W.J., Zheng, X.: An approach to imbalanced data sets based on changing rule strength. In: Learning from Imblanced Data Sets, AAAI Workshop at the 17th Conference on AI, AAAI-2000, Austin, TX, July 30–31, 2000, pp. 69–74 (2000)
Holland, J.H., Holyoak, K.J., Nisbett, R.E.: Induction. Processes of Inference, Learning, and Discovery. The MIT Press, Menlo Park (1986)
Jirapech-Umpai, T., Aitken, S.: Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes. BMC Bioinformatics 6 (2005), http://www.biomedcentral.com
Kuopio, T., Kankaanranta, A., Jalava, P., Kronqvist, P., Kotkansalo, T., Weber, E., Collan, Y.: Cysteine proteinase inhibitor cystatin A in breast cancer. Cancer Research 58, 432–436 (1998)
Lee, K.E., Sha, N.J., Dougherty, E.R., Vannucci, M., Mallick, B.K.: Gene selection: a Bayesian variable selection approach. Bioinformatics 19, 90–97 (2003)
Mori, N., Murakami, Y.I., Shimada, S., Iwamizu-Watanabe, S., Yamashita, Y., Hasegawa, Y., Kojima, H., Nagasawa, T.: TIA-1 expression in hairy cell leukemia. Modern Pathology 17, 840–846 (2004)
Nguyen, D.V., Rocke, D.M.: Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 18, 39–50 (2002)
Pawlak, Z.: Rough Sets. International Journal of Computer and Information Sciences 11, 341–356 (1982)
Pawlak, Z.: Rough Sets. In: Theoretical Aspects of Reasoning about Data, Kluwer Academic Publishers, Dordrecht (1991)
Sakhinia, E., Faranghpour, M., Yin, J.A.L., Brady, G., Hoyland, J.A., Byers, R.J.: Routine expression profiling of microarray gene signatures in acute leukaemia by real-time PCR of human bone marrow. British Journal of Haematology 130, 233–248 (2005)
Souza, D.G., Soares, A.C., Pinho, V., Torloni, H., Reis, L.F.L., Martins, M.T., Dias, A.A.M.: Increased mortality and inflammation in tumor necrosis factor-stimulated gene-14 transgenic mice after ischemia and reperfusion injury. American Journal of Pathology 160, 1755–1765 (2002)
Thomas, J.G., Olson, J.M., Tapscott, S.J., Zhao, L.P.: An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Research 11, 1227–1236 (2001)
Vinterbo, S.A., Kim, E.Y., Ohno-Machado, L.: Small, fuzzy and interpretable gene expression based classifiers. Bioinformatics 21, 1964–1970 (2005)
Wadman, I., Li, J.X., Bash, R.O., Forster, A., Osada, H., Rabbitts, T.H., Baer, R.: Specific in-vivo association between the Bhlh and Lim proteins implicated in human T-cell leukemia. EMBO Journal 13, 4831–4839 (1994)
Yeung, K.Y., Bumgarner, R.E., Raftery, A.E.: Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data. Bioinformatics 21, 2394–2402 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fang, J., Grzymala-Busse, J.W. (2006). Leukemia Prediction from Gene Expression Data—A Rough Set Approach. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Żurada, J.M. (eds) Artificial Intelligence and Soft Computing – ICAISC 2006. ICAISC 2006. Lecture Notes in Computer Science(), vol 4029. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11785231_94
Download citation
DOI: https://doi.org/10.1007/11785231_94
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35748-3
Online ISBN: 978-3-540-35750-6
eBook Packages: Computer ScienceComputer Science (R0)