Abstract
When errors are inevitable during data classification, finding a particular part of the classification model which may be more susceptible to error than others, when compared to finding an Achilles’ heel of the model in a casual way, may help uncover specific error-sensitive value patterns and lead to additional error reduction measures. As an initial phase of the investigation, this study narrows the scope of problem by focusing on decision trees as a pilot model, develops a simple and effective tagging method to digitize individual nodes of a binary decision tree for node-level analysis, to link and track classification statistics for each node in a transparent way, to facilitate the identification and examination of the potentially “weakest” nodes and error-sensitive value patterns in decision trees, to assist cause analysis and enhancement development.
This digitization method is not an attempt to re-develop or transform the existing decision tree model, but rather, a pragmatic node ID formulation that crafts numeric values to reflect the tree structure and decision making paths, to expand post-classification analysis to detailed node-level. Initial experiments have shown successful results in locating potentially high-risk attribute and value patterns; this is an encouraging sign to believe this study worth further exploration.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Quinlan, J.R.: C4. 5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth International Group, Belmont (1984)
Alpaydin, E.: Introduction to Machine Learning. The MIT Press, London (2004)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2006)
Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bio-informatics. Bioinformatics 23(19), 2507–2517 (2007)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)
Gheyas, I.A., Smith, L.S.: Feature subset selection in large dimensionality domains. Pattern Recogn. 43(1), 5–13 (2010)
Tabakhi, S., Moradi, P., Akhlaghian, F.: An unsupervised feature selection algorithm based on ant colony optimization. Eng. Appl. Artif. Intell. 32, 112–123 (2014)
Breiman, L.: Bagging Predictors. Mach. Learn. 24(2), 123–140 (1996)
Schapire, R.E.: The Strength of Weak Learnability. Mach. Learn. 5(2), 197–227 (1990)
Ho, T.K.: Random decision forests. In: Proceedings of the Third International Conference on Document Analysis and Recognition, vol. 1, pp. 278–282 (1995)
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of online learning and an application to boosting. In: Computational Learning Theory, pp. 23–37 (1995)
Breiman, L.: Random Forests. Mach. Learn. 45(1), 5–32 (2001)
Grossmann, E.: AdaTree: boosting a weak classifier into a decision tree. In: Computer Vision and Pattern Recognition Workshop (2004)
Tu, Z.: Probabilistic boosting-tree: learning discriminative models for classification, recognition, and clustering. In: Tenth IEEE International Conference on Computer Vision, vol. 2, pp. 1589–1596 (2005)
Monteith, K., Carroll, J.L., Seppi, K., Martinez, T.: Turning bayesian model averaging into bayesian model combination. In: The 2011 International Joint Conference on Neural Networks, pp. 2657–2663
Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley, New Jersey (2004)
Yang, P., Yang, Y.H., Zhou, B., Zomaya, A.: A review of ensemble methods in bioinformatics. Current Bioinf. 5(4), 296–308 (2010)
Wu, W., Zhang, S.: Evaluation of error-sensitive attributes. In: Li, J., Cao, L., Wang, C., Tan, K.C., Liu, B., Pei, J., Tseng, V.S. (eds.) PAKDD 2013 Workshops. LNCS, vol. 7867, pp. 283–294. Springer, Heidelberg (2013)
Bache, K., Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2013). http://archive.ics.uci.edu/ml
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Wu, W. (2015). Identify Error-Sensitive Patterns by Decision Tree. In: Perner, P. (eds) Advances in Data Mining: Applications and Theoretical Aspects. ICDM 2015. Lecture Notes in Computer Science(), vol 9165. Springer, Cham. https://doi.org/10.1007/978-3-319-20910-4_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-20910-4_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20909-8
Online ISBN: 978-3-319-20910-4
eBook Packages: Computer ScienceComputer Science (R0)