Skip to main content

Knowledge Discovery from Medical Data: An Empirical Study with XCS

  • Chapter
Learning Classifier Systems in Data Mining

Part of the book series: Studies in Computational Intelligence ((SCI,volume 125))

Summary

In this chapter we describe the use of a modern learning classifier system to a data mining task. In particular, in collaboration with a medical specialist, we apply XCS to a primary breast cancer data set. Our results indicate more effective knowledge discovery than with C4.5.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bacardit, J., Butz, M. (2004). Data mining in learning classifier systems: comparing XCS with GAssis. In: Advances in Learning Classifier Systems, 7th International Workshop, IWLCS 2004, Seattle, USA, LNAI, Springer, Berlin Heidelberg New York

    Google Scholar 

  2. Batista, G., Prati, R., Monard, M. (2004). A study of the behaviour of several methods for balancing machine learning training data. SIGKDD Explorations, 6(1), 20–29

    Article  Google Scholar 

  3. Bellman, R. (1961). Adaptive Control Processes: A Guided Tour, Princeton University Press, Princeton, NJ

    MATH  Google Scholar 

  4. Berkhin, P. (2002). Survey of clustering data mining techniques. Technical report, Accrue Software. Available from Accrue.com

    Google Scholar 

  5. Bernado, E., Llorà, X., Garrell, J. (2002). XCS and GALE: a comparative study of two learning classifier systems on data mining. In: Advances in Learning Classifier Systems, 4th International Workshop, Lecture Notes in Artificial Intelligence, vol. 2321. Springer, Berlin Heidelberg New York, pp. 115–132

    Google Scholar 

  6. Blake, C., Merz, C. (1998). UCI Repository of Machine Learning Databases. Irvine, CA: University of California, Department of Information and Computer Science. Available from http://www.ics.uci.edu/ ∼mlearn/MLRepository.html, accessed 2/2004 [online]

  7. Bonelli, P., Parodi, A. (1991). An efficient classifier system and its experimental comparison with two representative learning methods on three medical domains. In: Proceedings of the 4th International Conference on Genetic Algorithms. Morgan Kauffman, San Francisco, CA, pp. 288–295

    Google Scholar 

  8. Butz, M., Wilson, S.W. (2001). An algorithmic description of XCS. In: Advances in Learning Classifier Systems, Proceedings of the Third International Conference–IWLCS2000. Springer, Berlin Heidelberg New York, pp. 253–272

    Book  Google Scholar 

  9. http://www.springerlink.com/content/5llgudt7wh6p37ay

  10. Famili, F., Shen, W., Weber, R., Simoudis, E. (1997). Data preprocessing and intelligent data analysis. Intelligent Data Analysis, 1(1–4), 3–23

    Article  Google Scholar 

  11. Freitas, A. (2003). A survey of evolutionary algorithms for data mining and knowledge discovery. In: A. Ghosh, S. Tsutsui (eds). Advances in Evolutionary Computing: Theory and Applications, Natural Computing Series. Springer, Berlin Heidelberg New York, pp. 819–845

    Google Scholar 

  12. Han, J., Kamber, M., Tung, A. (2001). Spatial clustering methods in data mining: a survey. In: H. Miller, J. Han (eds). Geographic Data Mining and Knowledge Discovery. Taylor and Francis, London, pp. 188–217

    Google Scholar 

  13. Heyer, L., Kruglyak, S., Yooseph, S. (1999). Exploring expression data: identification and analysis of coexpressed genes. Genome Research, 9(11), 1106–1115

    Article  Google Scholar 

  14. Holland, J.H. (1986). Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems. In: R. Michalski, J. Carbonell, T. Mitchell (eds). Machine Learning: An Artificial Intelligence Approach. Morgan Kaufmann, San Francisco, CA

    Google Scholar 

  15. Holmes, J. (1997). Discovering risk of disease with a learning classifier system. In: T. Baeck (ed.). Proceedings of the Seventh International Conference on Genetic Algorithms (ICGA97). Morgan Kaufmann, San Francisco, CA

    Google Scholar 

  16. Holmes, J. (2000). Learning classifier systems: applied to knowledge discovery in clinical research databases. In: P. Lanzi, W. Stolzmann, S.W. Wilson (eds). Learning Classifier Systems: From Foundations to Applications. Springer, Berlin Heidelberg New York, pp. 243–261

    Chapter  Google Scholar 

  17. Holmes J., Bilker W. (2002). The effect of missing data on learning classifier system: learning rate and classification performance. In: Proceedings of 5th International Workshop, IWLCS, Granada, Spain, September 7–8, pp. 46–60

    Google Scholar 

  18. Holmes, J., Sager, J., Bilker, W. (2004). A comparison of three methods for covering missing data in XCS. In: 7th International Workshop on Learning Classifier Systems (IWLCS-2004)

    Google Scholar 

  19. Jaccard, P. (1912). The distribution of flora in the alpine zone. The New Phytologist, 11(2), 37–50

    Article  Google Scholar 

  20. Japkowicz, N. (2000). The class imbalance problem: significance and strategies. In: Proceedings of the 2000 International Conference on Artificial Intelligence (ICAI 2000), pp. 111–117

    Google Scholar 

  21. Japkowicz, N. (2003). Class imbalances: are we focusing on the right issue? Notes from the ICML Workshop on Learning from Imbalanced Data Sets II.

    Google Scholar 

  22. Japkowicz, N., Stephen, S. (2002) The class imbalance problem: a systematic study. Intelligent Data Analysis, 6(5), 429–450

    MATH  Google Scholar 

  23. Jensen, F. (1996). An Introduction to Bayesian Networks. Springer, Berlin Heidelberg New York

    Google Scholar 

  24. Jo, T., Japkowicz, N. (2004). Class Imbalances Versus Small Disjoints. ACM SIGKDD Explorations Newsletter, 6(1), 40–49

    Article  MathSciNet  Google Scholar 

  25. Kharbat, F. (2006). Learning Classifier Systems for Knowledge Discovery in Breast Cancer, Ph.D. Dissertation, UWE, UK

    Google Scholar 

  26. Kharbat, F., Bull, L., Odeh, M. (2005). Revisiting genetic selection in the XCS learning classifier system. In: Proceedings of the IEEE Congress on Evolutionary Computation, pp. 2061–2068

    Google Scholar 

  27. Kim, Y., Street, W., Menczer, F. (2003). Feature selection in data mining. In: J. Wang (ed.). Data Mining: Opportunities and Challenges. Idea Group Publishing, Hershery, PA, pp. 80–105

    Google Scholar 

  28. Kohavi, R., Provost, F. (1998). Glossary of terms. Editorial for the Special Issue on Applications of Machine Learning and the Knowledge Discovery Process. 30(2/3)

    Google Scholar 

  29. Land, W., Wong, L., McKee, D., Embrechts, M., Salih, R., Anderson, F. (2004). Applying support vector machines to breast cancer diagnosis using screen film mammogram data. In: 17th IEEE Symposium on Computer-Based Medical Systems (CBMS’04), pp. 224

    Google Scholar 

  30. Liu, W., Shen, P., Qu, Y., Xia, D. (2001). Fast algorithm of support vector machines in lung cancer diagnosis. In: International Workshop on Medical Imaging and Augmented Reality, June 10–12, Hong Kong, pp. 188–192

    Google Scholar 

  31. Molina, L., Belanche, L., Nebot, A. (2002). Feature selection algorithms: a survey and experimental evaluation. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02), pp. 306–313

    Google Scholar 

  32. Moore, A., Hoang, A. (2002). A performance assessment of Bayesian networks as a predictor of breast cancer survival. In: 2nd International Workshop on Intelligent Systems Design and Application, pp. 3–8

    Google Scholar 

  33. Ogunyemi, O., Chlebowski, R., Matloff, E., Schnabel, F., Orr, R., Col, N. (2004). Creating Bayesian network models for breast cancer risk prediction. In: Cancer Risk Prediction Models, A Workshop on Development, Evaluation, and Application, May 20–21, 2004, Washington DC, USA

    Google Scholar 

  34. Pantazi, S., Arocha, J., Moehr, J. (2004). Case-based medical informatics. BCM Medical Informatics and decision making, 4, 19

    Article  Google Scholar 

  35. Quinlan, J. (1993). C 4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA

    Google Scholar 

  36. Sierra, B., Larranaga, P. (1998). Predicting the survival in malignant skin melanoma using Bayesian networks: an empirical comparison between different approaches. Artificial Intelligence in Medicine, 14(1–2), 215–230

    Article  Google Scholar 

  37. Sorace, J., Zhan, M. (2003). A data review and re-assessment of ovarian cancer serum proteomic profiling. BMC Bioinformatics, 4, 24

    Article  Google Scholar 

  38. Strehl, A., Ghosh, J. (2000). Impact of similarity measures on web-pages clustering. In: Proceedings of the 17th National Conference on AI (AAAI2000), pp. 58–64

    Google Scholar 

  39. Tan, K., Yu, Q., Heng, C., Lee, T. (2003). Evolutionary computing for knowledge discovery in medical diagnosis. Artificial Intelligence in Medicine, 27(2), 129–154

    Article  Google Scholar 

  40. Tibshirani, R., Walther, G., Hastie, T. (2001). Estimating the number of clusters in a dataset via the gap statistic. Journal of the Royal Statistical Society B, 63, 411–423

    Article  MATH  MathSciNet  Google Scholar 

  41. Weiss, G. (2003). The Effect of Small Disjuncts and Class Distribution on Decision Tree Learning, Ph.D. Dissertation, Department of Computer Science, Rutgers University, New Brunswick

    Google Scholar 

  42. Wilson, S.W. (1995). Classifier Fitness Based on Accuracy. Evolutionary Computation, 3(2), 149–176

    Article  Google Scholar 

  43. Wilson, S. (2000). Get Real! XCS with Continuous-Valued Inputs, Learning Classifier Systems. From Foundations to Applications. Springer, Berlin Heidelberg New York, pp. 209–222

    Google Scholar 

  44. Witten, I., Frank, E. (2005). Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco, CA

    MATH  Google Scholar 

  45. Wyatt, D., Bull, L., Parmee, I. (2003). Building Compact Rulesets for Describing Continuous-Valued Problem Spaces Using a Learning Classifier System. In: I. Parmee (ed.). Adaptive Computing in Design and Manufacture VI. Springer, pp. 235–248

    Google Scholar 

  46. Yeung, K., Haynor, D., Ruzzo, W. (2001). Validating clustering for gene expression data. Bioinformatics, 17, 309–318

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Kharbat, F., Odeh, M., Bull, L. (2008). Knowledge Discovery from Medical Data: An Empirical Study with XCS. In: Bull, L., Bernadó-Mansilla, E., Holmes, J. (eds) Learning Classifier Systems in Data Mining. Studies in Computational Intelligence, vol 125. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78979-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78979-6_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78978-9

  • Online ISBN: 978-3-540-78979-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics