One Dependence Augmented Naive Bayes

Jiang, Liangxiao; Zhang, Harry; Cai, Zhihua; Su, Jiang

doi:10.1007/11527503_22

Liangxiao Jiang²¹,
Harry Zhang²²,
Zhihua Cai²¹ &
…
Jiang Su²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3584))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

2398 Accesses

Abstract

In real-world data mining applications, an accurate ranking is as important as an accurate classification. Naive Bayes has been widely used in data mining as a simple and effective classification and ranking algorithm. Since its conditional independence assumption is rarely true, numerous algorithms have been proposed to improve naive Bayes, for example, SBC[1] and TAN[2]. Indeed, the experimental results show that SBC and TAN achieve a significant improvement in term of classification accuracy. However, unfortunately, our experiments also show that SBC and TAN perform even worse than naive Bayes in ranking measured by AUC[3,4](the area under the Receiver Operating Characteristics curve). This fact raises the question of whether we can improve Naive Bayes with both accurate classification and ranking? In this paper, responding to this question, we present a new learning algorithm called One Dependence Augmented Naive Bayes(ODANB). Our motivation is to develop a new algorithm to improve Naive Bayes’ performance not only on classification measured by accuracy but also on ranking measured by AUC. We experimentally tested our algorithm, using the whole 36 UCI datasets recommended by Weka[5], and compared it to Naive Bayes, SBC and TAN. The experimental results show that our algorithm outperforms all the other algorithms significantly in yielding accurate ranking, yet at the same time outperforms all the other algorithms slightly in terms of classification accuracy.

This work was supported by Excellent Youth Foundation of China University of Geosciences(No.CUGQNL0505) and Natural Science Foundation of Hubei of China(No.2001ABB006 and No.2003ABA043).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Impact of Feature Selection on Average Ranking Method via Metalearning

Bounding the difference between RankRC and RankSVM and application to multi-level rare class kernel ranking

Article 08 September 2017

An In-Depth Comparison of Neural and Probabilistic Tree Models for Learning-to-rank

References

Langley, P., Sage, S.: Induction of selective Bayesian classifiers. In: Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence, pp. 339–406 (1994)
Google Scholar
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian Network Classifiers. Machine Learning 29, 131–163 (1997)
Article MATH Google Scholar
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30, 1145–1159 (1997)
Article Google Scholar
Provost, F., Fawcett, T.: Analysis and visualization of classifier performance: comparison under imprecise class and cost distribution. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pp. 43–48. AAAI Press, Menlo Park (1997)
Google Scholar
http://prdownloads.sourceforge.net/weka/datasets-UCI.jar
Bennett, P.N.: Assessing the Calibration of Naive Bayes’ Posterior Estimates. Technical Report No. CMU-CS100-155 (2000)
Google Scholar
Chickering, D.M.: Learning Bayesian networks is NP-Complete. In: Fisher, D., Lenz, H. (eds.) Learning from Data: Artificial Intelligence and Statistics V, pp. 121–130. Springer, Heidelberg (1996)
Google Scholar
Hand, D.J., Till, R.J.: A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning 45, 171–186 (2001)
Article MATH Google Scholar
Domingos, P., Pazzani, M.: Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier. Machine Learning 29, 103–130 (1997)
Article MATH Google Scholar
Pearl, J.: Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Francisco (1988)
Google Scholar
Witten, I.H., Frank, E.: Data Mining-Practical Machine Learning Tools and Techniques with Java Implementation. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Nadeau, C., Bengio, Y.: Inference for the generalization error. In: Advances in Neural In- formation Processing Systems 12, pp. 307–313. MIT Press, Cambridge (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Science, China University of Geosciences, Wuhan, 430074, P.R.China
Liangxiao Jiang & Zhihua Cai
Faculty of Computer Science, University of New Brunswick, P.O. Box 4400, Fredericton, NB, E3B 5A3, Canada
Harry Zhang & Jiang Su

Authors

Liangxiao Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Harry Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhihua Cai
View author publications
You can also search for this author in PubMed Google Scholar
Jiang Su
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Technology and Electrical Engineering, The University of Queensland, 4072, Brisbane, Queensland, Australia
Xue Li
The State Key Laboratory for Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, 430072, Wuhan, China
Shuliang Wang
School of ITEE, The Univ of Queensland, St. Lucia, 4072, QLD, Australia
Zhao Yang Dong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, L., Zhang, H., Cai, Z., Su, J. (2005). One Dependence Augmented Naive Bayes. In: Li, X., Wang, S., Dong, Z.Y. (eds) Advanced Data Mining and Applications. ADMA 2005. Lecture Notes in Computer Science(), vol 3584. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11527503_22

Download citation

DOI: https://doi.org/10.1007/11527503_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27894-8
Online ISBN: 978-3-540-31877-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics