Skip to main content
Log in

Instance weighting versus threshold adjusting for cost-sensitive classification

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In real-world classification problems, different types of misclassification errors often have asymmetric costs, thus demanding cost-sensitive learning methods that attempt to minimize average misclassification cost rather than plain error rate. Instance weighting and post hoc threshold adjusting are two major approaches to cost-sensitive classifier learning. This paper compares the effects of these two approaches on several standard, off-the-shelf classification methods. The comparison indicates that the two approaches lead to similar results for some classification methods, such as Naïve Bayes, logistic regression, and backpropagation neural network, but very different results for other methods, such as decision tree, decision table, and decision rule learners. The findings from this research have important implications on the selection of the cost-sensitive classifier learning approach as well as on the interpretation of a recently published finding about the relative performance of Naïve Bayes and decision trees.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Afifi AA, Clark V (1996). Computer-aided multivariate analysis, 3rd edn. Chapman & Hall, London

    MATH  Google Scholar 

  2. Blake CL, Merz CJ (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html

  3. Bradley AP (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit 30(7): 1145–1159

    Article  Google Scholar 

  4. Chan P, Stolfo S (1998) Towards scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In: Proceedings of the 4th international conference on knowledge discovery and data mining (KDD), New York, pp 164–168

  5. Chawla N, Bowyer K, Hall L, Kegelmeyer W (2002). SMOTE: Synthetic minority over-sampling technique. J Artif Intell Res 16: 321–357

    MATH  Google Scholar 

  6. Cohen WW (1995) Fast effective rule induction. In: Proceedings of the12th international conference on Machine Learning (ICML), Lake Tahoe, CA, pp 115–123

  7. De Falco I, Della Cioppa A, Iazzetta A, Tarantino E (2005). An evolutionary approach for automatically extracting intelligible classification rules. Knowl Inf Syst 7(2): 179–201

    Article  Google Scholar 

  8. Domingos P (1999) MetaCost: A general method for making classifiers cost sensitive. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), San Diego, CA, pp 155–164

  9. Domingos P, Pazzani M (1996) Beyond independence: conditions for the optimality of the simple Bayesian classifier. In: Proceedings of the 13th international conference on machine learning (ICML), Bari, Italy, pp 105–112

  10. Drummond C, Holte R (2000) Exploiting the cost (in)sensitivity of decision tree splitting criteria. In: Proceedings of the 17th international conference on machine learning (ICML), Stanford, CA, pp 239–249

  11. Drummond C, Holte R (2006). Cost curves: an improved method for visualizing classifier performance. Mach Learn 65(1): 95–130

    Article  Google Scholar 

  12. Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of the 17th international joint conference on artificial intelligence (IJCAI), Seattle, WA, pp 973–978

  13. Fan W, Stolfo SJ, Zhang J, Chan PK (1999) AdaCost: Misclassification cost-sensitive boosting. In: Proceedings of the 16th international conference on machine learning (ICML), Bled, Slovenia, pp 97–105

  14. Fawcett T (2003). ROC graphs: Notes and practical considerations for data mining researchers. HPL-2003–4, Intelligent Enterprise Technologies Lab Hewlett-Packard, PaloAlto

    Google Scholar 

  15. Gama J (2000). Iterative Bayes. Intell Data Anal 4(6): 475–488

    Google Scholar 

  16. Hand DJ, Till RJ (2001). A simple generalization of the area under the ROC curve to multiple class classification problems. Mach Learn 45(2): 171–186

    Article  Google Scholar 

  17. Hidalgo JMG (2002) Evaluating cost-sensitive unsolicited bulk email categorization. In: Proceedings of the 2002 ACM symposium on applied computing (SAC), Madrid, Spain, pp 615–620

  18. Hosmer DW, Lemeshow S (2000). Applied Logistic Regression, 2nd edn. Wiley, New York

    Google Scholar 

  19. Huang J, Ling CX (2005). Using AUC and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng 17(3): 299–310

    Article  Google Scholar 

  20. Japkowicz N, Stephen S (2002). The class imbalance problem: a systematic study. Intell Data Anal 6(5): 429–449

    Google Scholar 

  21. Kim Y, Kim J (2004). Convex hull ensemble machine for regression and classification. Knowl Inf Syst 6(6): 645–663

    Article  Google Scholar 

  22. Kohavi R (1995a) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th international joint conference on artificial intelligence (IJCAI), Montreal, Quebec, Canada, pp 1137–1143

  23. Kohavi R (1995b) The power of decision tables. In: Proceedings of the 8th European conference on machine learning (ECML), Heraclion, Crete, Greece, pp 174–189

  24. Li T, Zhu S, Ogihara M (2006). Using discriminant analysis for multi-class classification: an experimental investigation. Knowl Inf Syst 10(4): 453–472

    Article  Google Scholar 

  25. Lin FY, McClean S (2000) The prediction of financial distress using a cost sensitive approach and prior probability. In: Proceedings of the 17th international conference on machine learning (ICML) workshop on cost-sensitive learning, Stanford, CA, pp 84–88

  26. Margineantu D (2002) Class probability estimation and cost-sensitive classification decisions. In: Proceedings of the 13th European conference on machine learning (ECML), Helsinki, Finland, pp 270–281

  27. Provost F, Fawcett T, Kohavi R (1998) The case against accuracy estimation for comparing induction algorithms. In: Proceedings of the 15th international conference on machine learning (ICML), Madison, WI, pp 445–453

  28. Quinlan JR (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo

    Google Scholar 

  29. Rumelhart DE, Hinton GE, William RJ (1986). Learning representations by back-propagating errors. Nature 323: 533–536

    Article  Google Scholar 

  30. Sinha AP, May JH (2005). Evaluating and tuning predictive data mining models using receiver operating characteristic curves. J Manage Inf Syst 21(3): 249–280

    Google Scholar 

  31. Ting KM (2002). An instance-weighting method to induce cost-sensitive trees. IEEE Trans Knowl Data Eng 14(3): 659–665

    Article  Google Scholar 

  32. Weiss SM, Kulikowski CA (1991). Computer systems that learn—classification and prediction methods from statistics, neural nets, machine learning and expert system. Morgan Kaufmann, Palo Alto

    Google Scholar 

  33. Witten IH, Frank E (2005). Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco

    Google Scholar 

  34. Zadrozny B, Langford J, Abe N (2003) Cost-sensitive learning by cost-proportionate example weighting. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM), Melbourne, Florida, pp 435–442

  35. Zhang J, Kang DK, Silvescu A, Honavar V (2006). Learning accurate and concise naïve Bayes classifiers from attribute value taxonomies and data. Knowl Inf Syst 9(2): 157–179

    Article  Google Scholar 

  36. Zhao H, Ram S (2004). Constrained cascade generalization of decision trees. IEEE Trans Knowl Data Eng 16(6): 727–739

    Article  Google Scholar 

  37. Zhao H, Sinha AP (2005). An efficient algorithm for generating generalized decision forests. IEEE Transactions on Systems, Man and Cybernetics, Part A: Syst Hum 35(5): 754–762

    Article  Google Scholar 

  38. Zhao H, Sinha AP, Ram S (2006). Elitist and ensemble strategies for cascade generalization. J Database Manage 17(3): 92–107

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huimin Zhao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, H. Instance weighting versus threshold adjusting for cost-sensitive classification. Knowl Inf Syst 15, 321–334 (2008). https://doi.org/10.1007/s10115-007-0079-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-007-0079-1

Keywords

Navigation