Skip to main content
Log in

Improving decision tree performance by exception handling

  • Published:
International Journal of Automation and Computing Aims and scope Submit manuscript

Abstract

This paper focuses on improving decision tree induction algorithms when a kind of tie appears during the rule generation procedure for specific training datasets. The tie occurs when there are equal proportions of the target class outcome in the leaf node’s records that leads to a situation where majority voting cannot be applied. To solve the above mentioned exception, we propose to base the prediction of the result on the naive Bayes (NB) estimate, k-nearest neighbour (k-NN) and association rule mining (ARM). The other features used for splitting the parent nodes are also taken into consideration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth. From data mining to knowledge discovery: An overview. Advances in Knowledge Discovery and Data Mining, U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy, Menlo Park, CA, USA: American Association for Artificial Intelligence, pp. 1–34, 1996.

    Google Scholar 

  2. J. W. Han, M. Kamber. Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, 2006.

  3. J. R. Quinlan. Induction of decision trees. Journal of Machine Learning, vol. 1, no. 1, pp. 81–106, 1986.

    Google Scholar 

  4. J. R. Quinlan. Simplifying decision trees. International Journal of Man-machine Studies, vol. 27, no. 3, pp. 221–234, 1987.

    Article  Google Scholar 

  5. P. E. Utgoff. Improved training via incremental learning. In Proceedings of the 6th International Workshop on Machine Learning, Morgan Kaufmann Publishers Inc., Ithaca, New York, USA, pp. 362–365, 1989.

    Google Scholar 

  6. P. E. Utgoff. ID5: An incremental ID3. In Proceedings of the 5th International Conference on Machine Learning, Morgan Kaufmann Publishers Inc., Ann Arbor, MI, USA, pp. 107–120, 1988.

    Google Scholar 

  7. P. E. Utgoff. An improved algorithm for incremental induction of decision trees. In Proceedings of the 11th International Conference on Machine Learning, pp. 318–325, 1994.

  8. J. R. Quinlan. C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, 1993.

  9. L. Breiman, J. H. Friedman, R. A. Olsen, C. J. Stone. Classification and Regression Trees, Wadsworth and Brooks, 1984.

  10. P. E. Utgoff. Incremental induction of decision trees. Machine Learning, vol. 4, no. 2, pp. 161–186, 1989.

    Article  Google Scholar 

  11. S. A. Balamurugan, R. Rajaram. Effective and efficient feature selection for large scale data using Bayes’ theorem. International Journal of Automation and Computing, vol. 6, no. 1, pp. 62–71, 2009.

    Article  Google Scholar 

  12. W. Buntine. Learning classification trees. Statistics and Computing, vol. 2, no. 2, pp. 63–73, 1992.

    Article  Google Scholar 

  13. C. R. P. Hartmann, P. K. Varshney, K. G. Mehrotra, C. L. Gerberich. Application of information theory to the construction of efficient decision trees. IEEE Transactions on Information Theory, vol. 28, no. 4, pp. 565–577, 1982.

    Article  MATH  Google Scholar 

  14. J. Mickens, M. Szummer, D. Narayanan. Snitch interactive decision trees for troubleshooting misconfigurations. In Proceedings of the 2nd USENIX Workshop on Tackling Computer Systems Problems with Machine Learning Techniques, USENIX Association, Cambridge, MA, USA, Article No. 8, 2007.

    Google Scholar 

  15. R. Kohavi, C. Kunz. Option decision trees with majority votes. In Proceedings of the 14th International Conference on Machine Learning, Morgan Kaufmann, pp. 161–169, 1997.

  16. R. Carina, A. Niculescu-Mizil. An empirical comparison of supervised learning algorithms. In Proceedings of the 23rd International Conference on Machine Learning, ACM, Pittsburgh, Pennsylvania, USA, pp. 161–168, 2006.

    Chapter  Google Scholar 

  17. J. C. Schlimmer, D. Fisher. A case study of incremental concept induction. In Proceedings of the 5th National Conference on Artificial Intelligence, Morgan Kaufmann, Philadelpha, USA, pp. 496–501, 1986.

    Google Scholar 

  18. J. C. Schlimmer, R. Granger. Beyond incremental processing: Tracking concept drift. In Proceedings of the 5th National Conference on Artificial Intelligence, vol. 1, pp. 502–507, 1986.

    Google Scholar 

  19. P. E. Utgoff, N. C. Berkman, J. A. Clouse. Decision tree induction based on efficient tree restructuring. Machine Learning, vol. 29, no. 1, pp. 5–44, 2004.

    Article  Google Scholar 

  20. H. A. Chipman, E. I. George, R. E. McCulloch. Bayesian CART model search. Journal of the American Statistical Association, vol. 93, no. 443, pp. 935–948, 1998.

    Article  Google Scholar 

  21. R. Kohavi. Scaling up the accuracy of naive Bayes classifiers: A decision tree hybrid. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, AAAI Press, pp. 202–207, 1996.

  22. L. M. Wang, S. M. Yuan, L. Li, H. J. Li. Improving the performance of decision tree: A hybrid approach. Conceptual Modeling, Lecture Notes in Computer Science, Springer, vol. 3288, pp. 327–335, 2004.

    Google Scholar 

  23. Y. Li, K. H. Ang, G. C. Y. Chong, W. Y. Feng, K. C. Tan, H. Kashiwagi. CAutoCSD-evolutionary search and optimisation enabled computer automated control system design. International Journal of Automation and Computing, vol.1, no. 1, pp. 76–88, 2006.

    Article  Google Scholar 

  24. Z. H. Zhou, Z. Q. Chen. Hybrid decision tree. Journal of Knowledge-based Systems, vol. 15, no. 8, pp. 515–528, 2002.

    Article  Google Scholar 

  25. WEKA. Open Source Collection of Machine Learning Algorithm.

  26. I. H. Witten, E. Frank. Data Mining-practical Machine Learning Tools and Techniques with Java Implementation, 2nd Edition, 2004.

  27. C. L. Blake, C. J. Merz. UCI Repository of Machine Learning Databases, [Online], Available: http://www.ics.uci.edu/?mlearn/mlrepository.html, 2008.

  28. E. Frank, M. Hall, B. Pfahringer. Locally weighted naive Bayes. In Proceedings of Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann, pp. 249–256, 2003.

  29. J. Joyce. Bayes Theorem, Stanford Encyclopedia of Philosophy, 2003.

  30. L. Jiang, D. Wang, Z. Cai, X. Yan. Survey of improving naive Bayes for classification. In Proceedings of the 3rd International Conference on Advanced Data Mining and Applications, Springer, vol. 4632, pp. 134–145, 2007.

    Article  Google Scholar 

  31. P. Langley, W. Iba, K. Thompson. An analysis of Bayesian classifiers. In Proceedings of the 10th National Conference on Artificial Intelligence, AAAI press and MIT press, pp. 223–228, 1992.

  32. J. M. Bernardo, A. F. Smith. Bayesian Theory, John Wiley & Sons, 1993.

  33. D. W. Aha, D. Kibler, M. K. Albert. Instance-based learning algorithms. Machine Learning, vol. 6, no. 1, pp. 37–66, 1991.

    Google Scholar 

  34. T. M. Cover, P. E. Hart. Nearest neighbour pattern classification. IEEE Transactions on Information Theory, vol. 13, no. 1, pp. 21–27, 1967.

    Article  MATH  Google Scholar 

  35. S. M. Weiss. Small sample error rate estimation for knearest neighbour classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 3, pp. 285–289, 1991.

    Article  Google Scholar 

  36. T. Fukuda, Y. Morimoto, S Morishita, T. Tokuyama. Data mining with optimized two-dimensional association rules. ACM Transactions on Database Systems, vol. 26, no. 2, pp. 179–213, 2001.

    Article  MATH  Google Scholar 

  37. R. Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 1137–1143, 1995.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Appavu Alias Balamurugan Subramanian.

Additional information

Appavu Alias Balamurugan Subramanian is a Ph. D. candidate in the Department of Information and Communication Engineering at Anna University, Chennai, India. He is also a faculty at Thiagarajar College of Engineering, Madurai, India.

His research interests include data mining and text mining.

S. Pramala received the B.Tech. degree in the Department of Information Technology at Thiagarajar College of Engineering, Madurai, India in 2009, and the B.Tech. degree in information technology in May, 2010.

Her research interests include data mining, where classification and prediction methods dominate her domain area, mining of frequent patterns, associations and correlations existing in test data.

B. Rajalakshmi is doing her undergraduate course in information technology. She qualifies as a final year student (2009) at Thiagarajar College of Engineering, Madurai, India. She received B. Tech. degree in information technology in May, 2010.

Her research interests include textual data mining, usage of various classification techniques for efficient retrieval of data, data pruning, and pre-processing.

Ramasamy Rajaram received the Ph.D. degree from Madurai Kamaraj University, India. He is a professor of Department of Computer Science and Information Technology at Thiagarajar College of Engineering, Madurai, India.

His research interests include data mining and information security.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Subramanian, A.A.B., Pramala, S., Rajalakshmi, B. et al. Improving decision tree performance by exception handling. Int. J. Autom. Comput. 7, 372–380 (2010). https://doi.org/10.1007/s11633-010-0517-5

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11633-010-0517-5

Keywords

Navigation