Abstract
This paper focuses on improving decision tree induction algorithms when a kind of tie appears during the rule generation procedure for specific training datasets. The tie occurs when there are equal proportions of the target class outcome in the leaf node’s records that leads to a situation where majority voting cannot be applied. To solve the above mentioned exception, we propose to base the prediction of the result on the naive Bayes (NB) estimate, k-nearest neighbour (k-NN) and association rule mining (ARM). The other features used for splitting the parent nodes are also taken into consideration.
Similar content being viewed by others
References
U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth. From data mining to knowledge discovery: An overview. Advances in Knowledge Discovery and Data Mining, U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy, Menlo Park, CA, USA: American Association for Artificial Intelligence, pp. 1–34, 1996.
J. W. Han, M. Kamber. Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, 2006.
J. R. Quinlan. Induction of decision trees. Journal of Machine Learning, vol. 1, no. 1, pp. 81–106, 1986.
J. R. Quinlan. Simplifying decision trees. International Journal of Man-machine Studies, vol. 27, no. 3, pp. 221–234, 1987.
P. E. Utgoff. Improved training via incremental learning. In Proceedings of the 6th International Workshop on Machine Learning, Morgan Kaufmann Publishers Inc., Ithaca, New York, USA, pp. 362–365, 1989.
P. E. Utgoff. ID5: An incremental ID3. In Proceedings of the 5th International Conference on Machine Learning, Morgan Kaufmann Publishers Inc., Ann Arbor, MI, USA, pp. 107–120, 1988.
P. E. Utgoff. An improved algorithm for incremental induction of decision trees. In Proceedings of the 11th International Conference on Machine Learning, pp. 318–325, 1994.
J. R. Quinlan. C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, 1993.
L. Breiman, J. H. Friedman, R. A. Olsen, C. J. Stone. Classification and Regression Trees, Wadsworth and Brooks, 1984.
P. E. Utgoff. Incremental induction of decision trees. Machine Learning, vol. 4, no. 2, pp. 161–186, 1989.
S. A. Balamurugan, R. Rajaram. Effective and efficient feature selection for large scale data using Bayes’ theorem. International Journal of Automation and Computing, vol. 6, no. 1, pp. 62–71, 2009.
W. Buntine. Learning classification trees. Statistics and Computing, vol. 2, no. 2, pp. 63–73, 1992.
C. R. P. Hartmann, P. K. Varshney, K. G. Mehrotra, C. L. Gerberich. Application of information theory to the construction of efficient decision trees. IEEE Transactions on Information Theory, vol. 28, no. 4, pp. 565–577, 1982.
J. Mickens, M. Szummer, D. Narayanan. Snitch interactive decision trees for troubleshooting misconfigurations. In Proceedings of the 2nd USENIX Workshop on Tackling Computer Systems Problems with Machine Learning Techniques, USENIX Association, Cambridge, MA, USA, Article No. 8, 2007.
R. Kohavi, C. Kunz. Option decision trees with majority votes. In Proceedings of the 14th International Conference on Machine Learning, Morgan Kaufmann, pp. 161–169, 1997.
R. Carina, A. Niculescu-Mizil. An empirical comparison of supervised learning algorithms. In Proceedings of the 23rd International Conference on Machine Learning, ACM, Pittsburgh, Pennsylvania, USA, pp. 161–168, 2006.
J. C. Schlimmer, D. Fisher. A case study of incremental concept induction. In Proceedings of the 5th National Conference on Artificial Intelligence, Morgan Kaufmann, Philadelpha, USA, pp. 496–501, 1986.
J. C. Schlimmer, R. Granger. Beyond incremental processing: Tracking concept drift. In Proceedings of the 5th National Conference on Artificial Intelligence, vol. 1, pp. 502–507, 1986.
P. E. Utgoff, N. C. Berkman, J. A. Clouse. Decision tree induction based on efficient tree restructuring. Machine Learning, vol. 29, no. 1, pp. 5–44, 2004.
H. A. Chipman, E. I. George, R. E. McCulloch. Bayesian CART model search. Journal of the American Statistical Association, vol. 93, no. 443, pp. 935–948, 1998.
R. Kohavi. Scaling up the accuracy of naive Bayes classifiers: A decision tree hybrid. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, AAAI Press, pp. 202–207, 1996.
L. M. Wang, S. M. Yuan, L. Li, H. J. Li. Improving the performance of decision tree: A hybrid approach. Conceptual Modeling, Lecture Notes in Computer Science, Springer, vol. 3288, pp. 327–335, 2004.
Y. Li, K. H. Ang, G. C. Y. Chong, W. Y. Feng, K. C. Tan, H. Kashiwagi. CAutoCSD-evolutionary search and optimisation enabled computer automated control system design. International Journal of Automation and Computing, vol.1, no. 1, pp. 76–88, 2006.
Z. H. Zhou, Z. Q. Chen. Hybrid decision tree. Journal of Knowledge-based Systems, vol. 15, no. 8, pp. 515–528, 2002.
WEKA. Open Source Collection of Machine Learning Algorithm.
I. H. Witten, E. Frank. Data Mining-practical Machine Learning Tools and Techniques with Java Implementation, 2nd Edition, 2004.
C. L. Blake, C. J. Merz. UCI Repository of Machine Learning Databases, [Online], Available: http://www.ics.uci.edu/?mlearn/mlrepository.html, 2008.
E. Frank, M. Hall, B. Pfahringer. Locally weighted naive Bayes. In Proceedings of Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann, pp. 249–256, 2003.
J. Joyce. Bayes Theorem, Stanford Encyclopedia of Philosophy, 2003.
L. Jiang, D. Wang, Z. Cai, X. Yan. Survey of improving naive Bayes for classification. In Proceedings of the 3rd International Conference on Advanced Data Mining and Applications, Springer, vol. 4632, pp. 134–145, 2007.
P. Langley, W. Iba, K. Thompson. An analysis of Bayesian classifiers. In Proceedings of the 10th National Conference on Artificial Intelligence, AAAI press and MIT press, pp. 223–228, 1992.
J. M. Bernardo, A. F. Smith. Bayesian Theory, John Wiley & Sons, 1993.
D. W. Aha, D. Kibler, M. K. Albert. Instance-based learning algorithms. Machine Learning, vol. 6, no. 1, pp. 37–66, 1991.
T. M. Cover, P. E. Hart. Nearest neighbour pattern classification. IEEE Transactions on Information Theory, vol. 13, no. 1, pp. 21–27, 1967.
S. M. Weiss. Small sample error rate estimation for knearest neighbour classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 3, pp. 285–289, 1991.
T. Fukuda, Y. Morimoto, S Morishita, T. Tokuyama. Data mining with optimized two-dimensional association rules. ACM Transactions on Database Systems, vol. 26, no. 2, pp. 179–213, 2001.
R. Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 1137–1143, 1995.
Author information
Authors and Affiliations
Corresponding author
Additional information
Appavu Alias Balamurugan Subramanian is a Ph. D. candidate in the Department of Information and Communication Engineering at Anna University, Chennai, India. He is also a faculty at Thiagarajar College of Engineering, Madurai, India.
His research interests include data mining and text mining.
S. Pramala received the B.Tech. degree in the Department of Information Technology at Thiagarajar College of Engineering, Madurai, India in 2009, and the B.Tech. degree in information technology in May, 2010.
Her research interests include data mining, where classification and prediction methods dominate her domain area, mining of frequent patterns, associations and correlations existing in test data.
B. Rajalakshmi is doing her undergraduate course in information technology. She qualifies as a final year student (2009) at Thiagarajar College of Engineering, Madurai, India. She received B. Tech. degree in information technology in May, 2010.
Her research interests include textual data mining, usage of various classification techniques for efficient retrieval of data, data pruning, and pre-processing.
Ramasamy Rajaram received the Ph.D. degree from Madurai Kamaraj University, India. He is a professor of Department of Computer Science and Information Technology at Thiagarajar College of Engineering, Madurai, India.
His research interests include data mining and information security.
Rights and permissions
About this article
Cite this article
Subramanian, A.A.B., Pramala, S., Rajalakshmi, B. et al. Improving decision tree performance by exception handling. Int. J. Autom. Comput. 7, 372–380 (2010). https://doi.org/10.1007/s11633-010-0517-5
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11633-010-0517-5