Learning Tree Augmented Naive Bayes for Ranking

Jiang, Liangxiao; Zhang, Harry; Cai, Zhihua; Su, Jiang

doi:10.1007/11408079_63

Liangxiao Jiang¹⁹,
Harry Zhang²⁰,
Zhihua Cai¹⁹ &
…
Jiang Su²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3453))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

Abstract

Naive Bayes has been widely used in data mining as a simple and effective classification algorithm. Since its conditional independence assumption is rarely true, numerous algorithms have been proposed to improve naive Bayes, among which tree augmented naive Bayes (TAN) [3] achieves a significant improvement in term of classification accuracy, while maintaining efficiency and model simplicity. In many real-world data mining applications, however, an accurate ranking is more desirable than a classification. Thus it is interesting whether TAN also achieves significant improvement in term of ranking, measured by AUC(the area under the Receiver Operating Characteristics curve) [8,1]. Unfortunately, our experiments show that TAN performs even worse than naive Bayes in ranking. Responding to this fact, we present a novel learning algorithm, called forest augmented naive Bayes (FAN), by modifying the traditional TAN learning algorithm. We experimentally test our algorithm on all the 36 data sets recommended by Weka [12], and compare it to naive Bayes, SBC [6], TAN [3], and C4.4 [10], in terms of AUC. The experimental results show that our algorithm outperforms all the other algorithms significantly in yielding accurate rankings. Our work provides an effective and efficient data mining algorithm for applications in which an accurate ranking is required.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

On the Capability of Classification Trees and Random Forests to Estimate Probabilities

Article 22 April 2024

An In-Depth Comparison of Neural and Probabilistic Tree Models for Learning-to-rank

An empirical comparison of random forest-based and other learning-to-rank algorithms

Article 28 October 2019

References

Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30, 1145–1159 (1997)
Article Google Scholar
Cohen, W.W., Schapire, R.E., Singer, Y.: Learning to order things. Journal of Artificial Intelligence Research 10, 243–270 (1997)
MathSciNet Google Scholar
Friedman, N., Greiger, D., Goldszmidt, M.: Bayesian Network Classifiers. Machine Learning 29, 103–130 (1997)
Article Google Scholar
Hand, D.J., Till, R.J.: A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning 45, 171–186 (2001)
Article MATH Google Scholar
Keogh, E., Pazzani, M.: Learning augmented bayesian classifiers. In: Proceedings of Seventh International Workshop on AI and Statistics. Ft. Lauderdale (1999)
Google Scholar
Langley, P., Sage, S.: Induction of selective Bayesian classifiers. In: Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence, pp. 339–406 (1994)
Google Scholar
Merz, C., Murphy, P., Aha, D.: UCI repository of machine learning databases. Dept of ICS, University of California, Irvine (1997), http://www.ics.uci.edu/~mlearn/MLRepository.html
Provost, F., Fawcett, T.: Analysis and visualization of classifier performance: comparison under imprecise class and cost distribution. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pp. 43–48. AAAI Press, Menlo Park (1997)
Google Scholar
Provost, F., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 445–453. Morgan Kaufmann, San Francisco (1998)
Google Scholar
Provost, F.J., Domingos, P.: Tree Induction for Probability-Based Ranking. Machine Learning 52(3), 199–215 (2003)
Article MATH Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Google Scholar
http://prdownloads.sourceforge.net/weka/datasets-UCI.jar
Witten, I.H., Frank, E.: Data Mining –Practical Machine Learning Tools and Techniques with Java Implementation. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Ling, C.X., Zhang, H.: Toward Bayesian classifiers with accurate probabilities. In: Proceedings of the Sixth Pacific-Asia Conference on KDD, pp. 123–134. Springer, Heidelberg (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, China University of Geosciences, Wuhan, 430074, China
Liangxiao Jiang & Zhihua Cai
Faculty of Computer Science, University of New Brunswick, P.O. Box 4400, Fredericton, NB, E3B 5A3, Canada
Harry Zhang & Jiang Su

Authors

Liangxiao Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Harry Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhihua Cai
View author publications
You can also search for this author in PubMed Google Scholar
Jiang Su
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Research Institute of Information Technology, Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science and Technology, Tsinghua University, 100084, Beijing, China
Lizhu Zhou
National University of Singapore, Singapore
Beng Chin Ooi
School of Information, Renmin University of China,
Xiaofeng Meng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, L., Zhang, H., Cai, Z., Su, J. (2005). Learning Tree Augmented Naive Bayes for Ranking. In: Zhou, L., Ooi, B.C., Meng, X. (eds) Database Systems for Advanced Applications. DASFAA 2005. Lecture Notes in Computer Science, vol 3453. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11408079_63

Download citation

DOI: https://doi.org/10.1007/11408079_63
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25334-1
Online ISBN: 978-3-540-32005-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics