A Case Study for Learning from Imbalanced Data Sets

An, Aijun; Cercone, Nick; Huang, Xiangji

doi:10.1007/3-540-45153-6_1

Aijun An³,
Nick Cercone³ &
Xiangji Huang³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2056))

Included in the following conference series:

Conference of the Canadian Society for Computational Studies of Intelligence

1185 Accesses
4 Citations

Abstract

We present our experience in applying a rule induction technique to an extremely imbalanced pharmaceutical data set. We focus on using a variety of performance measures to evaluate a number of rule quality measures. We also investigate whether simply changing the distribution skew in the training data can improve predictive performance. Finally, we propose a method for adjusting the learning algorithm for learning in an extremely imbalanced environment. Our experimental results show that this adjustment improves predictive performance for rule quality formulas in which rule coverage makes positive contributions to the rule quality value.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aha, D. and Kibler, D. 1987. “Learning Representative Exemplars of Concepts: An Initial Case Study.” Proceedings of the Fourth International Conference on Machine Learning, Irvine, CA.
Google Scholar
An, A. and Cercone, N. 1998. “ELEM2: A Learning System for More Accurate Classifications.” Proceedings of the 12th Biennial Conference of the Canadian Society for Computational Studies of Intelligence, AI’98 (Lecture Notes in Artificial Intelligence 1418), Vancouver, Canada.
Google Scholar
An, A. and Cercone, N. 2000. “Rule Quality Measures Improve the Accuracy of Rule Induction: An Experimental Approach.”, Proceedings of the 12th International Symposium on Methodologies for Intelligent Systems, Charlotte, NC. pp.119–129.
Google Scholar
Bruha, I. 1996. “Quality of Decision Rules: Definitions and Classification Schemes for Multiple Rules.”, in Nakhaeizadeh, G. and Taylor, C. C. (eds.): Machine Learning and Statistics, The Interface. Jone Wiley & Sons Inc.
Google Scholar
Cardie, C and Howe, N. 1997. “Improving Minority Class Prediction Using Case-Specific Feature Weights.”, Proceedings of the Fourteenth International Confernece on Machine Learning, Morgan Kaufmann. pp.57–65.
Google Scholar
DeRouin, E., Brown, J., Beck, H., Fausett, L. and Schneider, M. 1991. “Neural Network Training on Unequally Represented Classes.”, In Dagli, C.H., Kumara, S.R.T. and Shin, Y.C. (eds.), Intelligent Engineering Systems Through Artificial Neural Networks, ASME Press. pp.135–145.
Google Scholar
Duda, R., Gaschnig, J. and Hart, P. 1979. “Model Design in the Prospector Consultant System for Mineral Exploration.”. In D. Michie (ed.), Expert Systems in the Micro-electronic Age. Edinburgh University Press, Edinburgh, UK.
Google Scholar
Harman, D.K. (ed.) 1995. Overview of the Third Text REtrieval Conference (TREC-3), NIST Special Publication. pp. A5–A13.
Google Scholar
Kubat, M. and Matwin, S. 1997. “Addressing the Curse of Imbalanced Training Sets: One-Sided Sampling.”. Proceedings of the Fourteenth International Conference on Machine Learning, Morgan Kaufmann. pp.179–186.
Google Scholar
Kubat, M., Holte, R. and Matwin, S. 1997. “Learning when Negative Examples Abound,.” Proceedings of ECML-97, Springer. pp.146–153.
Google Scholar
Kubat, M., Holte, R. and Matwin, S. 1998. “Machine Learning for the Detection of Oil Spills in Satellite Radar Images”, Machine Learning, 30, pp.195–215.
Article Google Scholar
Provost, F. 2000 “Machine Learning from Imbalanced Data Sets”, Invited paper for the AAAI’2000 Workshop on Imbalanced Data Sets, http://www.stern.nyu.edu/~fprovost/home.html#Publications.
Provost, F. and Fawcett, T. 2000. “Robust Classification for Imprecise Environments.”, to appear in Machine Learning.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
Aijun An, Nick Cercone & Xiangji Huang

Authors

Aijun An
View author publications
You can also search for this author in PubMed Google Scholar
Nick Cercone
View author publications
You can also search for this author in PubMed Google Scholar
Xiangji Huang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Alberta, Edmonton, AB, Canada, T6G 2E8
Eleni Stroulia
School of Information Technology and Engineering, University of Ottawa, Ottawa, ON, Canada, K1N 6N5
Stan Matwin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

An, A., Cercone, N., Huang, X. (2001). A Case Study for Learning from Imbalanced Data Sets. In: Stroulia, E., Matwin, S. (eds) Advances in Artificial Intelligence. Canadian AI 2001. Lecture Notes in Computer Science(), vol 2056. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45153-6_1

Download citation

DOI: https://doi.org/10.1007/3-540-45153-6_1
Published: 16 May 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42144-3
Online ISBN: 978-3-540-45153-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics