Instance Cloning Local Naive Bayes

Jiang, Liangxiao; Zhang, Harry; Su, Jiang

doi:10.1007/11424918_29

Liangxiao Jiang²⁰,
Harry Zhang²¹ &
Jiang Su²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3501))

Included in the following conference series:

Conference of the Canadian Society for Computational Studies of Intelligence

1192 Accesses
7 Citations

Abstract

The instance-based k-nearest neighbor algorithm (KNN)[1] is an effective classification model. Its classification is simply based on a vote within the neighborhood, consisting of k nearest neighbors of the test instance. Recently, researchers have been interested in deploying a more sophisticated local model, such as naive Bayes, within the neighborhood. It is expected that there are no strong dependences within the neighborhood of the test instance, thus alleviating the conditional independence assumption of naive Bayes. Generally, the smaller size of the neighborhood (the value of k), the less chance of encountering strong dependences. When k is small, however, the training data for the local naive Bayes is small and its classification would be inaccurate. In the currently existing models, such as LWNB [3], a relatively large k is chosen. The consequence is that strong dependences seem unavoidable.

In our opinion, a small k should be preferred in order to avoid strong dependences. We propose to deal with the problem of lack of local training data using sampling (cloning). Given a test instance, clones of each instance in the neighborhood is generated in terms of its similarity to the test instance and added to the local training data. Then, the local naive Bayes is trained from the expanded training data. Since a relatively small k is chosen, the chance of encountering strong dependences within the neighborhood is small. Thus the classification of the resulting local naive Bayes would be more accurate. We experimentally compare our new algorithm with KNN and its improved variants in terms of classification accuracy, using the 36 UCI datasets recommended by Weka [8], and the experimental results show that our algorithm outperforms all those algorithms significantly and consistently at various k values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aha, D.W., Kibler, D., Albert, M.K.: Instance-Based Learning Algorithms. Machine Learning 6, 37–66 (1991)
Google Scholar
Domingos, P., Pazzani, M.: Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier. Machine Learning 29, 103–130 (1997)
Article MATH Google Scholar
Frank, E., Hall, M., Pfahringer, B.: Locally Weighted Naive Bayes. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 249–256. Morgan Kaufmann, San Francisco (2003)
Google Scholar
Kohavi, R.: Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD 1996), pp. 202–207. AAAI Press, Menlo Park (1996)
Google Scholar
Langley, P., Iba, W., Thomas, K.: An Analysis of Bayesian Classifiers. In: Proceedings of the Tenth National Conference of Artificial Intelligence, pp. 223–228. AAAI Press, Menlo Park (1992)
Google Scholar
Merz, C., Murphy, P., Aha, D.: UCI repository of machine learning databases. Dept of ICS, University of California, Irvine (1997), http://www.ics.uci.edu/~mlearn/MLRepository.html
Nadeau, C., Bengio, Y.: Inference for the generalization error. In: Advances in Neural Information Processing Systems, vol. 12, pp. 307–313. MIT Press, Cambridge (1999)
Google Scholar
http://prdownloads.sourceforge.net/weka/datasets-UCI.jar
Weiss, G., Provost, M.: Learning when Training Data are Costly: The Effect of Class Distribution on Tree Induction. Journal of Artificial Intelligence Research 19, 315–354 (2003)
MATH Google Scholar
Witten, I.H., Frank, E.: Data Mining –Practical Machine Learning Tools and Techniques with Java Implementation. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Xie, Z., Hsu, W., Liu, Z., Lee, M.: SNNB: A Selective Neighborhood Based Naïve Bayes for Lazy Learning. In: Proceedings of the Sixth Pacific-Asia Conference on KDD, pp. 104–114. Springer, Heidelberg (2002)
Google Scholar
Zheng, Z., Webb, G.I.: Lazy Learning of Bayesian Rules. Machine Learning 41(1), 53–84 (2000)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Science, China University of Geosciences, Wuhan, 430074, China
Liangxiao Jiang
Faculty of Computer Science, University of New Brunswick, P.O. Box 4400, Fredericton, NB, E3B 5A3, Canada
Harry Zhang & Jiang Su

Authors

Liangxiao Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Harry Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jiang Su
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Département d’informatique et de recherche opérationelle, CP 6128 succ. Centre-Ville, Université de Montréal, H3C 3J7, Montréal, Canada
Balázs Kégl
Département d’informatique et de recherche opérationnelle, Université de Montréal,
Guy Lapalme

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, L., Zhang, H., Su, J. (2005). Instance Cloning Local Naive Bayes. In: Kégl, B., Lapalme, G. (eds) Advances in Artificial Intelligence. Canadian AI 2005. Lecture Notes in Computer Science(), vol 3501. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11424918_29

Download citation

DOI: https://doi.org/10.1007/11424918_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25864-3
Online ISBN: 978-3-540-31952-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics