BOAI: Fast Alternating Decision Tree Induction Based on Bottom-Up Evaluation

Yang, Bishan; Wang, Tengjiao; Yang, Dongqing; Chang, Lei

doi:10.1007/978-3-540-68125-0_36

Bishan Yang¹,
Tengjiao Wang¹,
Dongqing Yang¹ &
…
Lei Chang¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5012))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2601 Accesses
3 Citations

Abstract

Alternating Decision Tree (ADTree) is a successful classification model based on boosting and has a wide range of applications. The existing ADTree induction algorithms apply a “top-down” strategy to evaluate the best split at each boosting iteration, which is very time-consuming and thus is unsuitable for modeling on large data sets. This paper proposes a fast ADTree induction algorithm (BOAI) based on “bottom-up” evaluation, which offers high performance on massive data without sacrificing classification accuracy. BOAI uses a pre-sorting technique and dynamically evaluates splits by a bottom-up approach based on VW-group. With these techniques, huge redundancy in sorting and computation can be eliminated in the tree induction procedure. Experimental results on both real and synthetic data sets show that BOAI outperforms the best existing ADTree induction algorithm by a significant margin. In the real case study, BOAI also provides better performance than TreeNet and Random Forests, which are considered as efficient classification models.

This work is supported by the National ’863’ High-Tech Program of China under grant No. 2007AA01Z191, and the NSFC Grants 60473051, 60642004.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Multi-level Optimized Strategy for Imbalanced Data Classification Based on SMOTE and AdaBoost

Classification Algorithm Using Branches Importance

Article 05 November 2021

A novel approach to build accurate and diverse decision tree forest

Article 03 January 2021

References

Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)
Article MATH MathSciNet Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Freund, Y., Mason, L.: The alternating decision tree learning algorithm. In: 16th International Conference on Machine Learning, pp. 124–133 (1999)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Liu, K.Y., Lin, J., Zhou, X., Wong, S.: Boosting Alternating Decision Trees Modeling of Disease Trait Information. BMC Genetics 6(1) (2005)
Google Scholar
Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: A fast scalable classifier for data mining. In: 5th International Conference on Extending Database Technology, pp. 18–32 (1996)
Google Scholar
Shafer, J., Agrawal, R., Mehta, M.: SPRINT: A scalable parallel classifier for data mining. In: 22nd International Conference on Very Large Databases, pp. 544–555 (1996)
Google Scholar
Rastogi, R., Shim, K.: PUBLIC: A Decision Tree Classifier that Integrates Pruning and Building. In: 24th International Conference on Very Large Database, pp. 315–344 (1998)
Google Scholar
Gehrke, J., Ramakrishnan, R., Ganti, V.: Rainforest: A framework for fast decision tree construction of large data sets. In: 24th International Conference on Very Large Database, pp. 127–162 (1998)
Google Scholar
Gehrke, J., Ganti, V., Ramakrishnan, R., Loh, W.Y.: BOAT|optimistic decision tree construction. In: ACM SIGMOD International Conference on Management of Data, pp. 169–180 (1999)
Google Scholar
Pfahringer, B., Holmes, G., Kirkby, R.: Optimizing the Induction of Alternating Decision Trees. In: 5th Pasific-Asia Conference on Knowledge Discovery and Data Mining, pp. 477–487 (2001)
Google Scholar
Vanassche, A., Krzywania, D., Vaneyghen, J., Struyf, J., Blockeel, H.: First order alternating decision trees. In: 13th International Conference on Inductive Logic Programming, pp. 116–125 (2003)
Google Scholar
Breiman, L.: Random forests. Machine Learning Journal 45, 5–32 (2001)
Article MATH Google Scholar
http://www.salford-systems.com/products-treenet.html
IBM Intelligent information systems, http://www.almaden.ibm.com/software/quest/resources/
Maloof, M.: Learning when data sets are imbalanced and when costs are unequal and unknown. In: ICML Workshop on Learning from Imbalanced Data Sets (2003)
Google Scholar
Chen, C., Liaw, A., Breiman, L.: Using Random Forest to Learn Imbalanced Data. Technical Report 666, Statistics Department, University of California at Berkeley (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education, China School of Electronics Engineering and Computer Science, Peking University, Beijing, 100871, China
Bishan Yang, Tengjiao Wang, Dongqing Yang & Lei Chang

Authors

Bishan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Tengjiao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Dongqing Yang
View author publications
You can also search for this author in PubMed Google Scholar
Lei Chang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Takashi Washio Einoshin Suzuki Kai Ming Ting Akihiro Inokuchi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, B., Wang, T., Yang, D., Chang, L. (2008). BOAI: Fast Alternating Decision Tree Induction Based on Bottom-Up Evaluation. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2008. Lecture Notes in Computer Science(), vol 5012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68125-0_36

Download citation

DOI: https://doi.org/10.1007/978-3-540-68125-0_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68124-3
Online ISBN: 978-3-540-68125-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics