Summary
In this chapter we describe the use of a modern learning classifier system to a data mining task. In particular, in collaboration with a medical specialist, we apply XCS to a primary breast cancer data set. Our results indicate more effective knowledge discovery than with C4.5.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bacardit, J., Butz, M. (2004). Data mining in learning classifier systems: comparing XCS with GAssis. In: Advances in Learning Classifier Systems, 7th International Workshop, IWLCS 2004, Seattle, USA, LNAI, Springer, Berlin Heidelberg New York
Batista, G., Prati, R., Monard, M. (2004). A study of the behaviour of several methods for balancing machine learning training data. SIGKDD Explorations, 6(1), 20–29
Bellman, R. (1961). Adaptive Control Processes: A Guided Tour, Princeton University Press, Princeton, NJ
Berkhin, P. (2002). Survey of clustering data mining techniques. Technical report, Accrue Software. Available from Accrue.com
Bernado, E., Llorà , X., Garrell, J. (2002). XCS and GALE: a comparative study of two learning classifier systems on data mining. In: Advances in Learning Classifier Systems, 4th International Workshop, Lecture Notes in Artificial Intelligence, vol. 2321. Springer, Berlin Heidelberg New York, pp. 115–132
Blake, C., Merz, C. (1998). UCI Repository of Machine Learning Databases. Irvine, CA: University of California, Department of Information and Computer Science. Available from http://www.ics.uci.edu/ ∼mlearn/MLRepository.html, accessed 2/2004 [online]
Bonelli, P., Parodi, A. (1991). An efficient classifier system and its experimental comparison with two representative learning methods on three medical domains. In: Proceedings of the 4th International Conference on Genetic Algorithms. Morgan Kauffman, San Francisco, CA, pp. 288–295
Butz, M., Wilson, S.W. (2001). An algorithmic description of XCS. In: Advances in Learning Classifier Systems, Proceedings of the Third International Conference–IWLCS2000. Springer, Berlin Heidelberg New York, pp. 253–272
Famili, F., Shen, W., Weber, R., Simoudis, E. (1997). Data preprocessing and intelligent data analysis. Intelligent Data Analysis, 1(1–4), 3–23
Freitas, A. (2003). A survey of evolutionary algorithms for data mining and knowledge discovery. In: A. Ghosh, S. Tsutsui (eds). Advances in Evolutionary Computing: Theory and Applications, Natural Computing Series. Springer, Berlin Heidelberg New York, pp. 819–845
Han, J., Kamber, M., Tung, A. (2001). Spatial clustering methods in data mining: a survey. In: H. Miller, J. Han (eds). Geographic Data Mining and Knowledge Discovery. Taylor and Francis, London, pp. 188–217
Heyer, L., Kruglyak, S., Yooseph, S. (1999). Exploring expression data: identification and analysis of coexpressed genes. Genome Research, 9(11), 1106–1115
Holland, J.H. (1986). Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems. In: R. Michalski, J. Carbonell, T. Mitchell (eds). Machine Learning: An Artificial Intelligence Approach. Morgan Kaufmann, San Francisco, CA
Holmes, J. (1997). Discovering risk of disease with a learning classifier system. In: T. Baeck (ed.). Proceedings of the Seventh International Conference on Genetic Algorithms (ICGA97). Morgan Kaufmann, San Francisco, CA
Holmes, J. (2000). Learning classifier systems: applied to knowledge discovery in clinical research databases. In: P. Lanzi, W. Stolzmann, S.W. Wilson (eds). Learning Classifier Systems: From Foundations to Applications. Springer, Berlin Heidelberg New York, pp. 243–261
Holmes J., Bilker W. (2002). The effect of missing data on learning classifier system: learning rate and classification performance. In: Proceedings of 5th International Workshop, IWLCS, Granada, Spain, September 7–8, pp. 46–60
Holmes, J., Sager, J., Bilker, W. (2004). A comparison of three methods for covering missing data in XCS. In: 7th International Workshop on Learning Classifier Systems (IWLCS-2004)
Jaccard, P. (1912). The distribution of flora in the alpine zone. The New Phytologist, 11(2), 37–50
Japkowicz, N. (2000). The class imbalance problem: significance and strategies. In: Proceedings of the 2000 International Conference on Artificial Intelligence (ICAI 2000), pp. 111–117
Japkowicz, N. (2003). Class imbalances: are we focusing on the right issue? Notes from the ICML Workshop on Learning from Imbalanced Data Sets II.
Japkowicz, N., Stephen, S. (2002) The class imbalance problem: a systematic study. Intelligent Data Analysis, 6(5), 429–450
Jensen, F. (1996). An Introduction to Bayesian Networks. Springer, Berlin Heidelberg New York
Jo, T., Japkowicz, N. (2004). Class Imbalances Versus Small Disjoints. ACM SIGKDD Explorations Newsletter, 6(1), 40–49
Kharbat, F. (2006). Learning Classifier Systems for Knowledge Discovery in Breast Cancer, Ph.D. Dissertation, UWE, UK
Kharbat, F., Bull, L., Odeh, M. (2005). Revisiting genetic selection in the XCS learning classifier system. In: Proceedings of the IEEE Congress on Evolutionary Computation, pp. 2061–2068
Kim, Y., Street, W., Menczer, F. (2003). Feature selection in data mining. In: J. Wang (ed.). Data Mining: Opportunities and Challenges. Idea Group Publishing, Hershery, PA, pp. 80–105
Kohavi, R., Provost, F. (1998). Glossary of terms. Editorial for the Special Issue on Applications of Machine Learning and the Knowledge Discovery Process. 30(2/3)
Land, W., Wong, L., McKee, D., Embrechts, M., Salih, R., Anderson, F. (2004). Applying support vector machines to breast cancer diagnosis using screen film mammogram data. In: 17th IEEE Symposium on Computer-Based Medical Systems (CBMS’04), pp. 224
Liu, W., Shen, P., Qu, Y., Xia, D. (2001). Fast algorithm of support vector machines in lung cancer diagnosis. In: International Workshop on Medical Imaging and Augmented Reality, June 10–12, Hong Kong, pp. 188–192
Molina, L., Belanche, L., Nebot, A. (2002). Feature selection algorithms: a survey and experimental evaluation. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02), pp. 306–313
Moore, A., Hoang, A. (2002). A performance assessment of Bayesian networks as a predictor of breast cancer survival. In: 2nd International Workshop on Intelligent Systems Design and Application, pp. 3–8
Ogunyemi, O., Chlebowski, R., Matloff, E., Schnabel, F., Orr, R., Col, N. (2004). Creating Bayesian network models for breast cancer risk prediction. In: Cancer Risk Prediction Models, A Workshop on Development, Evaluation, and Application, May 20–21, 2004, Washington DC, USA
Pantazi, S., Arocha, J., Moehr, J. (2004). Case-based medical informatics. BCM Medical Informatics and decision making, 4, 19
Quinlan, J. (1993). C 4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA
Sierra, B., Larranaga, P. (1998). Predicting the survival in malignant skin melanoma using Bayesian networks: an empirical comparison between different approaches. Artificial Intelligence in Medicine, 14(1–2), 215–230
Sorace, J., Zhan, M. (2003). A data review and re-assessment of ovarian cancer serum proteomic profiling. BMC Bioinformatics, 4, 24
Strehl, A., Ghosh, J. (2000). Impact of similarity measures on web-pages clustering. In: Proceedings of the 17th National Conference on AI (AAAI2000), pp. 58–64
Tan, K., Yu, Q., Heng, C., Lee, T. (2003). Evolutionary computing for knowledge discovery in medical diagnosis. Artificial Intelligence in Medicine, 27(2), 129–154
Tibshirani, R., Walther, G., Hastie, T. (2001). Estimating the number of clusters in a dataset via the gap statistic. Journal of the Royal Statistical Society B, 63, 411–423
Weiss, G. (2003). The Effect of Small Disjuncts and Class Distribution on Decision Tree Learning, Ph.D. Dissertation, Department of Computer Science, Rutgers University, New Brunswick
Wilson, S.W. (1995). Classifier Fitness Based on Accuracy. Evolutionary Computation, 3(2), 149–176
Wilson, S. (2000). Get Real! XCS with Continuous-Valued Inputs, Learning Classifier Systems. From Foundations to Applications. Springer, Berlin Heidelberg New York, pp. 209–222
Witten, I., Frank, E. (2005). Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco, CA
Wyatt, D., Bull, L., Parmee, I. (2003). Building Compact Rulesets for Describing Continuous-Valued Problem Spaces Using a Learning Classifier System. In: I. Parmee (ed.). Adaptive Computing in Design and Manufacture VI. Springer, pp. 235–248
Yeung, K., Haynor, D., Ruzzo, W. (2001). Validating clustering for gene expression data. Bioinformatics, 17, 309–318
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Kharbat, F., Odeh, M., Bull, L. (2008). Knowledge Discovery from Medical Data: An Empirical Study with XCS. In: Bull, L., Bernadó-Mansilla, E., Holmes, J. (eds) Learning Classifier Systems in Data Mining. Studies in Computational Intelligence, vol 125. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78979-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-78979-6_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78978-9
Online ISBN: 978-3-540-78979-6
eBook Packages: EngineeringEngineering (R0)