Learning k-Nearest Neighbor Naive Bayes for Ranking

Jiang, Liangxiao; Zhang, Harry; Su, Jiang

doi:10.1007/11527503_21

Liangxiao Jiang²¹,
Harry Zhang²² &
Jiang Su²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3584))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

2429 Accesses
10 Citations

Abstract

Accurate probability-based ranking of instances is crucial in many real-world data mining applications. KNN (k-nearest neighbor) [1] has been intensively studied as an effective classification model in decades. However, its performance in ranking is unknown. In this paper, we conduct a systematic study on the ranking performance of KNN. At first, we compare KNN and KNNDW (KNN with distance weighted) to decision trees and naive Bayes in ranking, measured by AUC (the area under the Receiver Operating Characteristics curve). Then, we propose to improve the ranking performance of KNN by combining KNN with naive Bayes. The idea is that a naive Bayes is learned using the k nearest neighbors of the test instance as the training data and used to classify the test instance. A critical problem in combining KNN with naive Bayes is the lack of training data when k is small. We propose to deal with it using sampling to expand the training data. That is, each of the k nearest neighbors is “cloned” and the clones are added to the training data. We call our new model instance cloning local naive Bayes (simply ICLNB). We conduct extensive empirical comparison for the related algorithms in two groups in terms of AUC, using the 36 UCI datasets recommended by Weka[2]. In the first group, we compare ICLNB with other types of algorithms C4.4[3], naive Bayes and NBTree[4]. In the second group, we compare ICLNB with KNN, KNNDW and LWNB[5]. Our experimental results show that ICLNB outperforms all those algorithms significantly. From our study, we have two conclusions. First, KNN-relates algorithms performs well in ranking. Second, our new algorithm ICLNB performs best among the algorithms compared in this paper, and could be used in the applications in which an accurate ranking is desired.

This work was supported by Excellent Youth Foundation of China University of Geosciences(No.CUGQNL0505) and Natural Science Foundation of Hubei of China(No.2001ABB006 and No.2003ABA043).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aha, D.W., Kibler, D., Albert, M.K.: Instance-Based Learning Algorithms. Machine Learning 6, 37–66 (1991)
Google Scholar
http://prdownloads.sourceforge.net/weka/datasets-UCI.jar
Provost, F.J., Domingos, P.: Tree Induction for Probability-Based Ranking. Machine Learning 52(3), 199–215 (2003)
Article MATH Google Scholar
Kohavi, R.: Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD 1996), pp. 202–207. AAAI Press, Menlo Park (1996)
Google Scholar
Frank, E., Hall, M., Pfahringer, B.: Locally Weighted Naive Bayes. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (2003), pp. 249–256. Morgan Kaufmann, San Francisco (2003)
Google Scholar
Provost, F., Fawcett, T.: Analysis and visualization of classifier performance: comparison under imprecise class and cost distribution. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pp. 43–48. AAAI Press, Menlo Park (1997)
Google Scholar
Hand, D.J., Till, R.J.: A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning 45, 171–186 (2001)
Article MATH Google Scholar
Ling, C.X., Yan, R.J.: Decision Tree with Better Ranking. In: Proceedings of the 20th International Conference on Machine Learning, pp. 480–487. Morgan Kaufmann, San Francisco (2003)
Google Scholar
Huang, J., Lu, J., Ling, C.X.: Comparing Naive Bayes, Decision Trees, and SVM with AUC and Accuracy. In: Proceedings of the Third IEEE International Conference on Data Mining, pp. 553–556. IEEE Computer Society Press, Los Alamitos (2003)
Chapter Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Google Scholar
Zhang, H., Su, J.: Naive Bayesian Classifiers for Ranking. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 501–512. Springer, Heidelberg (2004)
Chapter Google Scholar
Xie, Z., Hsu, W., Liu, Z., Lee, M.: SNNB: A Selective Neighborhood Based Naive Bayes for Lazy Learning. In: Proceedings of the Sixth Pacific-Asia Conference on KDD, pp. 104–114. Springer, Heidelberg (2002)
Google Scholar
Zheng, Z., Webb, G.I.: Lazy Learning of Bayesian Rules. Machine Learning 41(1), 53–84 (2000)
Article Google Scholar
Witten, I.H., Frank, E.: Data Mining –Practical Machine Learning Tools and Techniques with Java Implementation. Morgan Kaufmann, San Francisco (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Science, China University of Geosciences, Wuhan, 430074, P.R.China
Liangxiao Jiang
Faculty of Computer Science, University of New Brunswick, P.O. Box 4400, Fredericton, NB, E3B 5A3, Canada
Harry Zhang & Jiang Su

Authors

Liangxiao Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Harry Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jiang Su
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Technology and Electrical Engineering, The University of Queensland, 4072, Brisbane, Queensland, Australia
Xue Li
The State Key Laboratory for Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, 430072, Wuhan, China
Shuliang Wang
School of ITEE, The Univ of Queensland, St. Lucia, 4072, QLD, Australia
Zhao Yang Dong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, L., Zhang, H., Su, J. (2005). Learning k-Nearest Neighbor Naive Bayes for Ranking. In: Li, X., Wang, S., Dong, Z.Y. (eds) Advanced Data Mining and Applications. ADMA 2005. Lecture Notes in Computer Science(), vol 3584. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11527503_21

Download citation

DOI: https://doi.org/10.1007/11527503_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27894-8
Online ISBN: 978-3-540-31877-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics