skip to main content
research-article

A survey of cost-sensitive decision tree induction algorithms

Published:12 March 2013Publication History
Skip Abstract Section

Abstract

The past decade has seen a significant interest on the problem of inducing decision trees that take account of costs of misclassification and costs of acquiring the features used for decision making. This survey identifies over 50 algorithms including approaches that are direct adaptations of accuracy-based methods, use genetic algorithms, use anytime methods and utilize boosting and bagging. The survey brings together these different studies and novel approaches to cost-sensitive decision tree learning, provides a useful taxonomy, a historical timeline of how the field has developed and should provide a useful reference point for future research in this field.

References

  1. Abe, N., Zadrozny, B., and Langford, J. 2004. An iterative method for multi-class cost-sensitive learning. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '04). W. Kim, R. Kohavi, J, J. Gehrke, and W. DuMouchel, Eds., 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Afifi, A. A. and Clark, V. 1996. Computer-Aided Multivariate Analysis, 3<sup>rd</sup> ED. Chapman & Hall, London. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bauer, E. and Kohavi, R. 1999. An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Mach. Learn. 36, 1-2, 105--139. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bradford, J. P., Kunz, C., Kohavi, R., Brunk, C., and Brodley, C. E. 1998a. Pruning decision trees with misclassification costs. In Proceedings of the 10<sup>th</sup> European Conference on Machine Learning (ECML '98). 131--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bradford, J. P., Kunz, C., Kohavi, R., Brunk, C., and Brodley, C. E. 1998b. Pruning decision trees with misclassification costs. http://robotics.stanford.edu/~ronnyk/prune-long.ps.gzGoogle ScholarGoogle Scholar
  6. Breiman, L., Friedman J. H., Olsen R. A., and Stone C. J. 1984. Classification and Regression Trees. Chapman and Hall/CRC, London.Google ScholarGoogle Scholar
  7. Breiman, L. 1996. Bagging predictors. Mach. Learn. 24, 2, 123--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Davis, J. V., Jungwoo, H., and Rossbach, C. J. 2006. Cost-Sensitive decision tree learning for forensic classification. InProceedings of 17<sup>th</sup> European Conference on Machine Learning (ECML). Lecture Notes in Computer Science, vol. 4212, Springer, 622--629. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Domingos, P. 1999. MetaCost: A general method for making classifiers cost-sensitive. InProceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 155--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Dong, M. and Kothari, R. 2001. Look-Ahead Based Fuzzy Decision Tree Induction. IEEE Trans. Fuzzy Syst. 9, 3, 461--468. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Draper, B. A., Brodley, C. E., and Utgoff, P. E. 1994. Goal-Directed classification using linear machine decision trees. IEEE Trans. Pattern Anal. Mach. Intell. 16, 9, 888--893. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Elkan, C. 2001. The foundations of cost-sensitive learning. In Proceedings of 17<sup>th</sup> International Joint Conference on Artificial Intelligence (IJCAI '01). Vol. 2., Morgan Kaufmann, 973--978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Esmeir, S. and Markovitch, S. 2004. Lookahead-Based algorithms for anytime induction of decision trees. In Proceedings of the 21st International Conference on Machine Learning (ICML '04). C. E. Brodley, Ed., 257--264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Esmeir, S. and Markovitch, S. 2007. Anytime induction of cost-sensitive trees. In Proceedings of The 21st Annual Conference on Neural Information Processing Systems (NIPS '07). 1--8.Google ScholarGoogle Scholar
  15. Esmeir, S. and Markovitch, S. 2008. Anytime induction of low-cost, low-error classifiers: A sampling-based approach. J. Artif. Intell. Res. 33, 1--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Esmeir, S. and Markovitch, S. 2010. Anytime algorithms for learning resource-bounded classifiers. InProceedings of the Budgeted Learning Workshop(ICML '10).Google ScholarGoogle Scholar
  17. Esmeir, S. and Markovitch, S. 2011. Anytime learning of anycost classifiers. Mach. Learn. 82, 3, 445--473. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Estruch, V., Ferri, C., Hernández-Orallo, J., and Ramírez-Quintana, m. j. 2002. Re-designing cost-sensitive decision tree learning. In Workshop de Mineria de Datos y Aprendizaje. 33--42.Google ScholarGoogle Scholar
  19. Fan, W., Stolfo, S. J., Zhang, J., and Chan, P. K. 1999. AdaCost: Misclassification cost-sensitive boosting. In Proceedings of the 16th International Conference on Machine Learning. 97--105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ferri, C., Flach, P., and Hernández-Orallo, J. 2002. Learning decision trees using the area under the roc curve. In Proceedings of the 19<sup>th</sup> Machine Learning International Workshop then Conference. 139--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ferri-Ramírez, C., Hernández, J., and Ramirez, M. J. 2002. Induction of decision multi-trees using levin search. In Proceedings of the International Conference on Computational Science (ICCS '02). Lecture Notes in Computer Science, vol. 2329., Springer, 166--175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Frank, E. and Witten, I. 1998. Reduced-Error pruning with significance tests. http://citeseerx.ist.psu.edu/viewdoc/summary&quest;doi=10.1.1.46.2272Google ScholarGoogle Scholar
  23. Frean, M. 1990. Small nets and short paths: Optimizing neural computation. Doctoral thesis, Centre for Cognitive Science, University of Edinburgh.Google ScholarGoogle Scholar
  24. Freitas, A., Costa-Pereira, A., and Brazdil, P. 2007. Cost-Sensitive decision trees applied to medical data. In Proceedings of the 9th International Conference on Data Warehousing and Knowledge Discovery. Lecture Notes in Computer Science, vol. 4654., Springer, 303--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Freund, Y. and Schapire, R. E. 1996. Experiments with a new boosting algorithm. In Proceedings of the 13<sup>th</sup> International Machine Learning Workshop then Conference. 148--156.Google ScholarGoogle Scholar
  26. Freund, Y. and Schapire, R. E. 1997. A decision-theoretic generalization of on-line learning and an application to boosting, J. Comput. Syst. Sci. 55, 1, 119--139. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Grefenstette, J. J. 1990. A User's Guide to GENESIS v5.0. Naval Research Laboratory, Washington, DC.Google ScholarGoogle Scholar
  28. Greiner, R., Grove, A. J., and Roth, D. 2002. Learning class-sensitive active classifiers. Art. Intell. 139, 2, 137--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Hart, A. E. 1985. Experience in the use of an inductive system in knowledge engineering. In Research and Development in Expert Systems. M. A. Bramer, Ed., Cambridge University Press.Google ScholarGoogle Scholar
  30. Hunt, E. B., Marin, J., and Stone, P. J. 1966. Experiments in Induction. Academic Press, New York.Google ScholarGoogle Scholar
  31. Knoll, U., Nakhaeizadeh, G., and Tausend, B. 1994. Cost-Sensitive pruning of decision trees. In Proceedings of the 8th European Conference on Machine Learning (ECML '94). 383--386. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Kretowski, M. and Grzes, M. 2007. Evolutionary induction of decision trees for misclassification cost minimization. InProceedings of the 8<sup>th</sup> International Conference on Adaptive and Natural Computing Algorithms (ICANNGA). Lecture Notes in Computer Science, vol. 4431, Springer, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Li, J., Li, X., and Yao, X. 2005. Cost-Sensitive classification with genetic programming. In Proceedings of the IEEE Congress on Evolutionary Computation. 2114--2121.Google ScholarGoogle Scholar
  34. Lin, F. Y. and Mcclean, S. 2000. The Prediction of financial distress using a cost sensitive approach and prior probabilities. In Proceedings of the 17<sup>th</sup> International Conference on Machine Learning (ICML '00).Google ScholarGoogle Scholar
  35. Ling, C. X., Yang, Q., Wang, J., and Zhang, S. 2004. Decision trees with minimal costs. In Proceedings of the ACM International Conference on Machine Learning. ACM Press New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Ling, C. X., Sheng, V. S., Bruckhaus, T., and Madhavji, N. H. 2006a. Maximum profit mining and its application in software development. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '06). 929. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ling, C., Sheng, V., and Yang, Q. 2006b. Test strategies for cost-sensitive decision trees. IEEE Trans. Knowl. Data Engin. 18, 8, 1055--1067. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Liu, X. 2007. A new cost-sensitive decision tree with missing values. Asian J. Inf. Technol. 6, 11, 1083--1090.Google ScholarGoogle Scholar
  39. Lomax, S. and Vadera, S. 2011. An empirical comparison of cost-sensitive decision tree induction algorithms. Expert Syst. J. Knowl. Engin. 28, 3, 227--268.Google ScholarGoogle ScholarCross RefCross Ref
  40. Lozano, A.C. and Abe, N. 2008. Multi-class cost-sensitive boosting with p-norm loss functions. In Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '08). 506. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Margineantu, D. and Dietterich, T. 2003. A wrapper method for cost-sensitive learning via stratification. http://citeseerx.ist.psu.edu/viewdoc/summary&quest;doi=10.1.1.27.1102.Google ScholarGoogle Scholar
  42. Margineantu, D. 2001. Methods for cost-sensitive learning. Doctoral thesis, Oregon State University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Merler, S., Furlanello, C., Larcher, B., and Sboner, A. 2003. Automatic model selection in cost-sensitive boosting. Inf. Fusion 4, 1, 3--10.Google ScholarGoogle ScholarCross RefCross Ref
  44. Mease, D., Wyner, A. J., and Buja, A. 2007. Boosted classification trees and class probability/quantile estimation. J. Mach. Learn. Res. 8, 409--439. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Meir, R. and Rätsch, g. 2003. An introduction to boosting and leveraging. In Advanced Lectures on Machine Learning, Mendelson, S., Smola, A. Eds., Springer, 119--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Michalewicz, Z. 1996. Genetic Algorithms &plus; Data Structures = Evolution Programs 3<sup>rd</sup> Ed. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Mingers, J. 1989. An empirical comparison of pruning methods for decision tree induction. Mach. Learn. 4, 227--243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Moret, S., Langford, W., and Margineantu, D. 2006. Learning to predict channel stability using biogeomorphic features. Ecol. Model. 191, 1, 47--57.Google ScholarGoogle ScholarCross RefCross Ref
  49. Morrison, D. 1976. Multivariate Statistical Method 2<sup>nd</sup> Ed. McGraw-Hill, New York.Google ScholarGoogle Scholar
  50. Murthy, S., Kasif, S., and Salzberg, S. 1994. A system for induction of oblique decision trees. J. Artif. Intell. Res. 2, 1--32. Google ScholarGoogle ScholarCross RefCross Ref
  51. Murthy, S. and Salzberg, S. 1995. Lookahead and pathology in decision tree induction. In Proceedings of the 14<sup>th</sup> International Joint Conference on Artificial Intelligence. 1025--1033. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Ni, A., Zhang, S., Yang, S., and Zhu, X. 2005. Learning classification rules under multiple costs. Asian J. Inf. Technol. 4, 1080--1085.Google ScholarGoogle Scholar
  53. Nilsson, N. J. 1965. Learning Machines. McGraw-Hill, New York.Google ScholarGoogle Scholar
  54. Norton, S. W. 1989. Generating better decision trees. In Proceedings of the 11th International Joint Conference on Artificial Intelligence (IJCAI '89). 800--805. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Núnez, M. 1991. The use of background knowledge in decision tree induction. Mach. Learn. 6, 231--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Omielan, A. 2005. Evaluation of a cost-sensitive genetic classifier. MPhil thesis, University of Salford.Google ScholarGoogle Scholar
  57. Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., and Brunk, C. 1994. Reducing misclassification costs. In Proceedings of the 11<sup>th</sup> International Conference on Machine Learning. 217--225. http://scholar.google.com/scholar&quest;hl=en&btnG=Search&q=intitle:Reducing&plus;&q=intitle:Reducing&plus;&plus;misclassification&plus;costs&num;0.Google ScholarGoogle Scholar
  58. Qin, Z., Zhang, S., and Zhang, C. 2004. Cost-Sensitive decision trees with multiple cost scales. In Proceedings of the 17<sup>th</sup> Austrailian Joint Conference on Artificial Intelligence. G. I. Webb and X. Yu, Eds., Lecture Notes in Artificial Intelligence, vol. 3339, Springer, 380--390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Quinlan, J. R. 1979. Discovering rules by induction from large collections of examples. In Expert Systems in the Micro Electronic Age, D. Michie, Ed., Edinburgh University Press, 168--201.Google ScholarGoogle Scholar
  60. Quinlan, J. R. 1983. Learning efficient classification procedures and their application to chess end games. In Machine Learning: An Artificial Intelligence Approach, Michalski, Garbonell and Mitchell Eds., Tioga Publishing Company, Palo Alto, CA.Google ScholarGoogle Scholar
  61. Quinlan, J. R. 1986. Induction of decision trees. Mach. Learn. 1, 81--106. Google ScholarGoogle ScholarCross RefCross Ref
  62. Quinlan, J. R. 1987. Simplifying decision trees. Int. J. Man-Mach. Studies 27, 221--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Quinlan, J. R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufman, San Mateo, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Quinlan, J. R., Compton, P. J., Horn, K. A., and Lazarus, L. 1987. Inductive knowledge acquisition: A case study. In Application of Expert Systems, J. Ross Quinlan, Ed., Turning Institute Press/Addison-Wesley, 137--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Rissanen, J. 1978. Modelling by shortest data description. Automatica, 14, 465--471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Schapire, R. E. 1999. A Brief Introduction to Boosting. In Proceedings of the 16<sup>th</sup> International Joint Conference on Artificial Intelligence (IJCAI99). Vol 2, 1401--1406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Schapire, R. E. and Singer, Y. 1999. Improved boosting algorithms using confidence-rated predictions. Machine Learn. 37, 3, 297--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Shannon, C. E. 1948. The mathematical theory of communication. Bell Syst. Tech. J. 27, 379--423.Google ScholarGoogle ScholarCross RefCross Ref
  69. Sheng, S. and Ling, C. 2005. Hybrid cost-sensitive decision tree. In Proceedings of the 9<sup>th</sup> European Conference on Principles and Practice of Knowledge Discovery in Databases. Lecture Notes in Computer Science, vol. 3721., Springer, 274--284.Google ScholarGoogle ScholarCross RefCross Ref
  70. Sheng, S., Ling, C., and Yang, Q. 2005. Simple test strategies for cost-sensitive decision trees. In 16<sup>th</sup> European Conference on Machine Learning (ECML' 05). Lecture Notes in Computer Science, vol. 3720. Springer, 365--376. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Swets, J., Dawes, R., and Monahan, J. 2000. Better decisions through science. Sci. Amer. 283, 4, 82--87.Google ScholarGoogle Scholar
  72. Tan, M. 1993. Cost-sensitive learning of classification knowledge and its applications in robotics. Machine Learn. 13, 7--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Tan, M. and Schlimmer J. 1989. Cost-Sensitive concept learning of sensor use in approach and recognition. In Proceedings of the 6<sup>th</sup> International Workshop on Machine Learning (ML '89). 392--395. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Ting, K. and Zheng, Z. 1998a. Boosting cost-sensitive trees. In Proceedings of the 1<sup>st</sup> International Conference on Discovery Science. Lecture Notes in Computer Science, vol. 1532., Springer, 244--255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Ting, K. M. and Zheng, Z. 1998b. Boosting trees for cost-sensitive classifications. In Proceedings of the 10th European Conference on Machine Learning. Springer, 190--195. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Ting, K. M. 1998. Inducing cost-sensitive decision trees via instance weighting. In Proceedings of the 2<sup>nd</sup> European Symposium on Principles of Data Mining and Knowledge Discovery. Springer, 139--147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Ting, K. 2000a. An empirical study of Metacost using boosting algorithms. In Proceedings of the 11<sup>th</sup> European Conference on Machine Learning. Lecture Notes in Computer Science, vol. 1810., Springer, 413--425. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Ting, K. 2000b. A comparative study of cost-sensitive boosting algorithms. In Proceedings of the 17th International Conference on Machine Learning. 983--990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Ting, K. M. 2002. An instance-weighting method to induce cost-sensitive decision trees. IEEE Trans. Knowl. Data Engin. 14, 3, 659--665. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Turney, P. D. 1995. Cost-Sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm. J. Artif. Intell. Res. 2, 369--409. Google ScholarGoogle ScholarCross RefCross Ref
  81. Vadera, S. 2005a. Inducing cost-sensitive non-linear decision trees. Tech. rep. http://scholar.google.com/scholar&quest;hl=en&btnG=Search&q=intitle:Inducing&plus;&q=intitle:Inducing&plus;&plus;cost-sensitive&plus;non-linear&plus;decision&plus;trees&num;0.Google ScholarGoogle Scholar
  82. Vadera, S. 2005b. Inducing safer oblique trees without costs. Expert Syst: Int. J. Knowl. Engin. Neural Netw. 22, 4, 206--221.Google ScholarGoogle ScholarCross RefCross Ref
  83. Vadera, S. 2010. CSNL: A cost-sensitive non-linear decision tree algorithm. ACM Trans. Knowl. Discov. Data 4, 2, 1--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. von Neumann, J. 1951. Various techniques used in connection with random digits. Monte carlo methods. Nat. Bureau Standards 12, 36--38.Google ScholarGoogle Scholar
  85. Winston, P. H. 1993. Artificial Intelligence 3rd Ed. Addison Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Zadrozny, B., Langford, J., and Abe, N. 2003a. A simple method for cost-sensitive learning. Tech. rep. RC22666. http://citeseer.ist.psu.edu/viewdoc/summary&quest;doi=10.1.1.7.7947.Google ScholarGoogle Scholar
  87. Zadrozny, B., Langford, J., and Abe, N. 2003b. Cost-Sensitive learning by cost-proportionate example weighting. In Proceedings of the 3rd IEEE International Conference on Data Mining. 435. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Zhang, S., Qin, Z., Ling, C., and Sheng, S. 2005. Missing is useful: Missing values in cost-sensitive decision trees. IEEE Trans. Knowl. Data Engin. 17, 12, 1689--1693. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Zhang, S., Zhu, X., Zhang, J., and Zhang, C. 2007. Cost-Time sensitive decision tree with missing values. Knowl. Sci., Engin. Manag. 4798, 447--459. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Zhang, S. 2010. Cost-Sensitive classification with respect to waiting cost. Knowl Based Syst. 23, 5, 369--378. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A survey of cost-sensitive decision tree induction algorithms

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Computing Surveys
      ACM Computing Surveys  Volume 45, Issue 2
      February 2013
      417 pages
      ISSN:0360-0300
      EISSN:1557-7341
      DOI:10.1145/2431211
      Issue’s Table of Contents

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 March 2013
      • Accepted: 1 September 2011
      • Revised: 1 July 2011
      • Received: 1 April 2011
      Published in csur Volume 45, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader