Abstract
Alternating Decision Tree (ADTree) is a successful classification model based on boosting and has a wide range of applications. The existing ADTree induction algorithms apply a “top-down” strategy to evaluate the best split at each boosting iteration, which is very time-consuming and thus is unsuitable for modeling on large data sets. This paper proposes a fast ADTree induction algorithm (BOAI) based on “bottom-up” evaluation, which offers high performance on massive data without sacrificing classification accuracy. BOAI uses a pre-sorting technique and dynamically evaluates splits by a bottom-up approach based on VW-group. With these techniques, huge redundancy in sorting and computation can be eliminated in the tree induction procedure. Experimental results on both real and synthetic data sets show that BOAI outperforms the best existing ADTree induction algorithm by a significant margin. In the real case study, BOAI also provides better performance than TreeNet and Random Forests, which are considered as efficient classification models.
This work is supported by the National ’863’ High-Tech Program of China under grant No. 2007AA01Z191, and the NSFC Grants 60473051, 60642004.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Freund, Y., Mason, L.: The alternating decision tree learning algorithm. In: 16th International Conference on Machine Learning, pp. 124–133 (1999)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Liu, K.Y., Lin, J., Zhou, X., Wong, S.: Boosting Alternating Decision Trees Modeling of Disease Trait Information. BMC Genetics 6(1) (2005)
Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: A fast scalable classifier for data mining. In: 5th International Conference on Extending Database Technology, pp. 18–32 (1996)
Shafer, J., Agrawal, R., Mehta, M.: SPRINT: A scalable parallel classifier for data mining. In: 22nd International Conference on Very Large Databases, pp. 544–555 (1996)
Rastogi, R., Shim, K.: PUBLIC: A Decision Tree Classifier that Integrates Pruning and Building. In: 24th International Conference on Very Large Database, pp. 315–344 (1998)
Gehrke, J., Ramakrishnan, R., Ganti, V.: Rainforest: A framework for fast decision tree construction of large data sets. In: 24th International Conference on Very Large Database, pp. 127–162 (1998)
Gehrke, J., Ganti, V., Ramakrishnan, R., Loh, W.Y.: BOAT|optimistic decision tree construction. In: ACM SIGMOD International Conference on Management of Data, pp. 169–180 (1999)
Pfahringer, B., Holmes, G., Kirkby, R.: Optimizing the Induction of Alternating Decision Trees. In: 5th Pasific-Asia Conference on Knowledge Discovery and Data Mining, pp. 477–487 (2001)
Vanassche, A., Krzywania, D., Vaneyghen, J., Struyf, J., Blockeel, H.: First order alternating decision trees. In: 13th International Conference on Inductive Logic Programming, pp. 116–125 (2003)
Breiman, L.: Random forests. Machine Learning Journal 45, 5–32 (2001)
IBM Intelligent information systems, http://www.almaden.ibm.com/software/quest/resources/
Maloof, M.: Learning when data sets are imbalanced and when costs are unequal and unknown. In: ICML Workshop on Learning from Imbalanced Data Sets (2003)
Chen, C., Liaw, A., Breiman, L.: Using Random Forest to Learn Imbalanced Data. Technical Report 666, Statistics Department, University of California at Berkeley (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, B., Wang, T., Yang, D., Chang, L. (2008). BOAI: Fast Alternating Decision Tree Induction Based on Bottom-Up Evaluation. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2008. Lecture Notes in Computer Science(), vol 5012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68125-0_36
Download citation
DOI: https://doi.org/10.1007/978-3-540-68125-0_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68124-3
Online ISBN: 978-3-540-68125-0
eBook Packages: Computer ScienceComputer Science (R0)