Knowledge Discovery from Medical Data: An Empirical Study with XCS

Kharbat, Faten; Odeh, Mohammed; Bull, Larry

doi:10.1007/978-3-540-78979-6_5

Faten Kharbat^5,6,
Mohammed Odeh⁵ &
Larry Bull⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 125))

696 Accesses
4 Citations

Summary

In this chapter we describe the use of a modern learning classifier system to a data mining task. In particular, in collaboration with a medical specialist, we apply XCS to a primary breast cancer data set. Our results indicate more effective knowledge discovery than with C4.5.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bacardit, J., Butz, M. (2004). Data mining in learning classifier systems: comparing XCS with GAssis. In: Advances in Learning Classifier Systems, 7th International Workshop, IWLCS 2004, Seattle, USA, LNAI, Springer, Berlin Heidelberg New York
Google Scholar
Batista, G., Prati, R., Monard, M. (2004). A study of the behaviour of several methods for balancing machine learning training data. SIGKDD Explorations, 6(1), 20–29
Article Google Scholar
Bellman, R. (1961). Adaptive Control Processes: A Guided Tour, Princeton University Press, Princeton, NJ
MATH Google Scholar
Berkhin, P. (2002). Survey of clustering data mining techniques. Technical report, Accrue Software. Available from Accrue.com
Google Scholar
Bernado, E., Llorà, X., Garrell, J. (2002). XCS and GALE: a comparative study of two learning classifier systems on data mining. In: Advances in Learning Classifier Systems, 4th International Workshop, Lecture Notes in Artificial Intelligence, vol. 2321. Springer, Berlin Heidelberg New York, pp. 115–132
Google Scholar
Blake, C., Merz, C. (1998). UCI Repository of Machine Learning Databases. Irvine, CA: University of California, Department of Information and Computer Science. Available from http://www.ics.uci.edu/ ∼mlearn/MLRepository.html, accessed 2/2004 [online]
Bonelli, P., Parodi, A. (1991). An efficient classifier system and its experimental comparison with two representative learning methods on three medical domains. In: Proceedings of the 4th International Conference on Genetic Algorithms. Morgan Kauffman, San Francisco, CA, pp. 288–295
Google Scholar
Butz, M., Wilson, S.W. (2001). An algorithmic description of XCS. In: Advances in Learning Classifier Systems, Proceedings of the Third International Conference–IWLCS2000. Springer, Berlin Heidelberg New York, pp. 253–272
Book Google Scholar
http://www.springerlink.com/content/5llgudt7wh6p37ay
Famili, F., Shen, W., Weber, R., Simoudis, E. (1997). Data preprocessing and intelligent data analysis. Intelligent Data Analysis, 1(1–4), 3–23
Article Google Scholar
Freitas, A. (2003). A survey of evolutionary algorithms for data mining and knowledge discovery. In: A. Ghosh, S. Tsutsui (eds). Advances in Evolutionary Computing: Theory and Applications, Natural Computing Series. Springer, Berlin Heidelberg New York, pp. 819–845
Google Scholar
Han, J., Kamber, M., Tung, A. (2001). Spatial clustering methods in data mining: a survey. In: H. Miller, J. Han (eds). Geographic Data Mining and Knowledge Discovery. Taylor and Francis, London, pp. 188–217
Google Scholar
Heyer, L., Kruglyak, S., Yooseph, S. (1999). Exploring expression data: identification and analysis of coexpressed genes. Genome Research, 9(11), 1106–1115
Article Google Scholar
Holland, J.H. (1986). Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems. In: R. Michalski, J. Carbonell, T. Mitchell (eds). Machine Learning: An Artificial Intelligence Approach. Morgan Kaufmann, San Francisco, CA
Google Scholar
Holmes, J. (1997). Discovering risk of disease with a learning classifier system. In: T. Baeck (ed.). Proceedings of the Seventh International Conference on Genetic Algorithms (ICGA97). Morgan Kaufmann, San Francisco, CA
Google Scholar
Holmes, J. (2000). Learning classifier systems: applied to knowledge discovery in clinical research databases. In: P. Lanzi, W. Stolzmann, S.W. Wilson (eds). Learning Classifier Systems: From Foundations to Applications. Springer, Berlin Heidelberg New York, pp. 243–261
Chapter Google Scholar
Holmes J., Bilker W. (2002). The effect of missing data on learning classifier system: learning rate and classification performance. In: Proceedings of 5th International Workshop, IWLCS, Granada, Spain, September 7–8, pp. 46–60
Google Scholar
Holmes, J., Sager, J., Bilker, W. (2004). A comparison of three methods for covering missing data in XCS. In: 7th International Workshop on Learning Classifier Systems (IWLCS-2004)
Google Scholar
Jaccard, P. (1912). The distribution of flora in the alpine zone. The New Phytologist, 11(2), 37–50
Article Google Scholar
Japkowicz, N. (2000). The class imbalance problem: significance and strategies. In: Proceedings of the 2000 International Conference on Artificial Intelligence (ICAI 2000), pp. 111–117
Google Scholar
Japkowicz, N. (2003). Class imbalances: are we focusing on the right issue? Notes from the ICML Workshop on Learning from Imbalanced Data Sets II.
Google Scholar
Japkowicz, N., Stephen, S. (2002) The class imbalance problem: a systematic study. Intelligent Data Analysis, 6(5), 429–450
MATH Google Scholar
Jensen, F. (1996). An Introduction to Bayesian Networks. Springer, Berlin Heidelberg New York
Google Scholar
Jo, T., Japkowicz, N. (2004). Class Imbalances Versus Small Disjoints. ACM SIGKDD Explorations Newsletter, 6(1), 40–49
Article MathSciNet Google Scholar
Kharbat, F. (2006). Learning Classifier Systems for Knowledge Discovery in Breast Cancer, Ph.D. Dissertation, UWE, UK
Google Scholar
Kharbat, F., Bull, L., Odeh, M. (2005). Revisiting genetic selection in the XCS learning classifier system. In: Proceedings of the IEEE Congress on Evolutionary Computation, pp. 2061–2068
Google Scholar
Kim, Y., Street, W., Menczer, F. (2003). Feature selection in data mining. In: J. Wang (ed.). Data Mining: Opportunities and Challenges. Idea Group Publishing, Hershery, PA, pp. 80–105
Google Scholar
Kohavi, R., Provost, F. (1998). Glossary of terms. Editorial for the Special Issue on Applications of Machine Learning and the Knowledge Discovery Process. 30(2/3)
Google Scholar
Land, W., Wong, L., McKee, D., Embrechts, M., Salih, R., Anderson, F. (2004). Applying support vector machines to breast cancer diagnosis using screen film mammogram data. In: 17th IEEE Symposium on Computer-Based Medical Systems (CBMS’04), pp. 224
Google Scholar
Liu, W., Shen, P., Qu, Y., Xia, D. (2001). Fast algorithm of support vector machines in lung cancer diagnosis. In: International Workshop on Medical Imaging and Augmented Reality, June 10–12, Hong Kong, pp. 188–192
Google Scholar
Molina, L., Belanche, L., Nebot, A. (2002). Feature selection algorithms: a survey and experimental evaluation. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02), pp. 306–313
Google Scholar
Moore, A., Hoang, A. (2002). A performance assessment of Bayesian networks as a predictor of breast cancer survival. In: 2nd International Workshop on Intelligent Systems Design and Application, pp. 3–8
Google Scholar
Ogunyemi, O., Chlebowski, R., Matloff, E., Schnabel, F., Orr, R., Col, N. (2004). Creating Bayesian network models for breast cancer risk prediction. In: Cancer Risk Prediction Models, A Workshop on Development, Evaluation, and Application, May 20–21, 2004, Washington DC, USA
Google Scholar
Pantazi, S., Arocha, J., Moehr, J. (2004). Case-based medical informatics. BCM Medical Informatics and decision making, 4, 19
Article Google Scholar
Quinlan, J. (1993). C 4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA
Google Scholar
Sierra, B., Larranaga, P. (1998). Predicting the survival in malignant skin melanoma using Bayesian networks: an empirical comparison between different approaches. Artificial Intelligence in Medicine, 14(1–2), 215–230
Article Google Scholar
Sorace, J., Zhan, M. (2003). A data review and re-assessment of ovarian cancer serum proteomic profiling. BMC Bioinformatics, 4, 24
Article Google Scholar
Strehl, A., Ghosh, J. (2000). Impact of similarity measures on web-pages clustering. In: Proceedings of the 17th National Conference on AI (AAAI2000), pp. 58–64
Google Scholar
Tan, K., Yu, Q., Heng, C., Lee, T. (2003). Evolutionary computing for knowledge discovery in medical diagnosis. Artificial Intelligence in Medicine, 27(2), 129–154
Article Google Scholar
Tibshirani, R., Walther, G., Hastie, T. (2001). Estimating the number of clusters in a dataset via the gap statistic. Journal of the Royal Statistical Society B, 63, 411–423
Article MATH MathSciNet Google Scholar
Weiss, G. (2003). The Effect of Small Disjuncts and Class Distribution on Decision Tree Learning, Ph.D. Dissertation, Department of Computer Science, Rutgers University, New Brunswick
Google Scholar
Wilson, S.W. (1995). Classifier Fitness Based on Accuracy. Evolutionary Computation, 3(2), 149–176
Article Google Scholar
Wilson, S. (2000). Get Real! XCS with Continuous-Valued Inputs, Learning Classifier Systems. From Foundations to Applications. Springer, Berlin Heidelberg New York, pp. 209–222
Google Scholar
Witten, I., Frank, E. (2005). Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco, CA
MATH Google Scholar
Wyatt, D., Bull, L., Parmee, I. (2003). Building Compact Rulesets for Describing Continuous-Valued Problem Spaces Using a Learning Classifier System. In: I. Parmee (ed.). Adaptive Computing in Design and Manufacture VI. Springer, pp. 235–248
Google Scholar
Yeung, K., Haynor, D., Ruzzo, W. (2001). Validating clustering for gene expression data. Bioinformatics, 17, 309–318
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, University of the West of England, Bristol, BS16 1QY, UK
Faten Kharbat, Mohammed Odeh & Larry Bull
Computing Department, Zarqa Private University, Zarqa, Jordan
Faten Kharbat

Authors

Faten Kharbat
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Odeh
View author publications
You can also search for this author in PubMed Google Scholar
Larry Bull
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, University of the West of England, Bristol, BS16 1QY, UK
Larry Bull
Enginyeria i Arquitectura La Salle, Universitat Ramon Llull, 08022, Barcelona, Spain
Ester Bernadó-Mansilla
Centre for Clinical Epidemiology and Biostatistics, University of Pennsylvania, Philadelphia, PA, 19104, USA
John Holmes

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kharbat, F., Odeh, M., Bull, L. (2008). Knowledge Discovery from Medical Data: An Empirical Study with XCS. In: Bull, L., Bernadó-Mansilla, E., Holmes, J. (eds) Learning Classifier Systems in Data Mining. Studies in Computational Intelligence, vol 125. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78979-6_5

Download citation

DOI: https://doi.org/10.1007/978-3-540-78979-6_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78978-9
Online ISBN: 978-3-540-78979-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics