Optimizing Classifiers for Hypothetical Scenarios

Johnson, Reid A.; Raeder, Troy; Chawla, Nitesh V.

doi:10.1007/978-3-319-18038-0_21

Reid A. Johnson¹⁰,
Troy Raeder¹¹ &
Nitesh V. Chawla¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9077))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

3489 Accesses
1 Citations

Abstract

The deployment of classification models is an integral component of many modern data mining and machine learning applications. A typical classification model is built with the tacit assumption that the deployment scenario by which it is evaluated is fixed and fully characterized. Yet, in the practical deployment of classification methods, important aspects of the application environment, such as the misclassification costs, may be uncertain during model building. Moreover, a single classification model may be applied in several different deployment scenarios. In this work, we propose a method to optimize a model for uncertain deployment scenarios. We begin by deriving a relationship between two evaluation measures, H measure and cost curves, that may be used to address uncertainty in classifier performance. We show that when uncertainty in classifier performance is modeled as a probabilistic belief that is a function of this underlying relationship, a natural definition of risk emerges for both classifiers and instances. We then leverage this notion of risk to develop a boosting-based algorithm—which we call RiskBoost—that directly mitigates classifier risk, and we demonstrate that it outperforms AdaBoost on a diverse selection of datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bache, K., Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003)
Chapter Google Scholar
Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning (ICML), pp. 233–240. ACM (2006)
Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research (JMLR) 7, 1–30 (2006)
MATH MathSciNet Google Scholar
Domingos, P.: MetaCost: a general method for making classifiers cost-sensitive. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 155–164. ACM (1999)
Google Scholar
Drummond, C., Holte, R.C.: Cost curves: An improved method for visualizing classifier performance. Machine Learning 65(1), 95–130 (2006)
Article Google Scholar
Fawcett, T.: An introduction to ROC analysis. Pattern Recognition Letters 27(8), 861–874 (2006)
Article MathSciNet Google Scholar
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of the 13th International Conference on Machine Learning (ICML), pp. 148–156 (1996)
Google Scholar
Hand, D.J.: Measuring classifier performance: A coherent alternative to the area under the ROC curve. Machine Learning 77(1), 103–123 (2009)
Article Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, vol. 2 (2009)
Google Scholar
Lempert, R.J., Popper, S.W., Bankes, S.C.: Shaping the Next One Hundred Years: New Methods for Quantitative, Long-Term Policy Analysis, Rand Corp (2003)
Google Scholar
Masnadi-Shirazi, H., Vasconcelos, N.: Cost-sensitive boosting. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 33(2), 294–309 (2011)
Article Google Scholar
Provost, F., Fawcett, T.: Analysis and visualization of classifier performance: comparison under imprecise class and cost distributions. In: Proceedings of the 3rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 43–48. AAAI (1997)
Google Scholar
Provost, F., Fawcett, T.: Robust classification for imprecise environments. Machine Learning 42(3), 203–231 (2001)
Article MATH Google Scholar
Quinlan, J.R.: Bagging, boosting, and C4.5. In: Proceedings of the 13th National Conference on Artificial Intelligence (AAAI), pp. 725–730 (1996)
Google Scholar
Raeder, T., Hoens, T.R., Chawla, N.V.: Consequences of variability in classifier performance estimates. In: Proceedings of the 10th IEEE International Conference on Data Mining (ICDM), pp. 421–430. IEEE (2010)
Google Scholar
Seiffert, C., Khoshgoftaar, T.M., Hulse, J.V., Napolitano, A.: RUSBoost: Improving classification performance when training data is skewed. In: Proceedings of the 19th International Conference on Pattern Recognition (ICPR), pp. 1–4. IEEE (2009)
Google Scholar
Sun, Y., Kamel, M.S., Wong, A.K.C., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition 40(12), 3358–3378 (2007)
Article MATH Google Scholar
Ting, K.M.: A comparative study of cost-sensitive boosting algorithms. In: Proceedings of the 17th International Conference on Machine Learning (ICML), pp. 983–990
Google Scholar
Zadrozny, B., Elkan, C.: Learning and making decisions when costs and probabilities are both unknown. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 204–213. ACM (2001)
Google Scholar
Zadrozny, B., Langford, J., Abe, N.: Cost-sensitive learning by cost-proportionate example weighting. In: Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM), pp. 435–442. IEEE (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Notre Dame, Notre Dame, IN, 46556, USA
Reid A. Johnson & Nitesh V. Chawla
Dstillery, 470 Park Ave. S., 6th Floor, New York, NY, 10016, USA
Troy Raeder

Authors

Reid A. Johnson
View author publications
You can also search for this author in PubMed Google Scholar
Troy Raeder
View author publications
You can also search for this author in PubMed Google Scholar
Nitesh V. Chawla
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nitesh V. Chawla .

Editor information

Editors and Affiliations

Ho Chi Minh City University of Technology, Ho Chi Minh City, Vietnam
Tru Cao
Singapore Management University, Singapore, Singapore
Ee-Peng Lim
Nanjing University, Nanjing, China
Zhi-Hua Zhou
Japan Advanced Institute of Science and Technology, Nomi City, Japan
Tu-Bao Ho
University of Hong Kong, Hong Kong, Hong Kong SAR
David Cheung
Osaka University, Osaka, Japan
Hiroshi Motoda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Johnson, R.A., Raeder, T., Chawla, N.V. (2015). Optimizing Classifiers for Hypothetical Scenarios. In: Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D., Motoda, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2015. Lecture Notes in Computer Science(), vol 9077. Springer, Cham. https://doi.org/10.1007/978-3-319-18038-0_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-18038-0_21
Published: 17 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18037-3
Online ISBN: 978-3-319-18038-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics