Abstract
It is well-known that in order to build a strong ensemble, the component learners should be with high diversity as well as high accuracy. If perturbing the training set can cause significant changes in the component learners constructed, then Bagging can effectively improve accuracy. However, for stable learners such as nearest neighbor classifiers, perturbing the training set can hardly produce diverse component learners, therefore Bagging does not work well. This paper adapts Bagging to nearest neighbor classifiers through injecting randomness to distance metrics. In constructing the component learners, both the training set and the distance metric employed for identifying the neighbors are perturbed. A large scale empirical study reported in this paper shows that the proposed BagInRand algorithm can effectively improve the accuracy of nearest neighbor classifiers.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Dietterich T G. Ensemble Learning. The Handbook of Brain Theory and Neural Networks, 2nd edition, M.A. Arbib (ed.), Cambridge, MA: MIT Press, 2002.
Krogh A, Vedelsby J. Neural Network Ensembles, Cross Validation, and Active Learning. In Advances in Neural Information Processing Systems 7, Tesauro G, Touretzky D S, Leen T K (eds.), Cambridge, MA: MIT Press, 1995, pp. 231–238.
Kuncheva L J, Whitaker C J. Measures of diversity in classifier ensembles. Machine Learning, 2003, 51(2): 181–207.
Breiman L. Bagging predictors. Machine Learning, 1996, 24(2): 123–140.
Efron B, Tibshirani R. An Introduction to the Bootstrap. New York: Chapman & Hall, 1993.
Aha D W. Lazy learning: Special issue editorial. Artificial Intelligence Review, 1997, 11(1–5): 7–10.
Dasarathy B V. Nearest Neighbor Norms: NN Pattern Classification Techniques, Los Alamitos, CA: IEEE Computer Society Press, 1991.
Kolen J F, Pollack J B. Back Propagation is Sensitive to Initial Conditions. In Advances in Neural Information Processing Systems 3, Lippmann R P, Moody J E, Touretzky D S (eds.), San Francisco, CA: Morgan Kaufmann, 1991, pp. 860–867.
Kwok S W, Carter C. Multiple decision trees. In Proc. the 4th Annual Conference on Uncertainty in Artificial Intelligence, New York, NY, 1988, pp. 327–338.
Dietterich T G. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 2000, 40(2): 139–157.
Ali K M, Pazzani M J. Error reduction through learning multiple descriptions. Machine Learning, 1996, 24(3): 173–202.
Stanfill C, Waltz D. Toward memory-based reasoning. Communications of the ACM, 1986, 29(12): 1213–1228.
Blake C, Keogh E, Merz C J. UCI repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine, CA, 1998. http://www.ics.uci.edu/~mlearn/MLRepository.html
Ho T K. Nearest neighbors in random subspaces. In Lecture Notes in Computer Science 1451, Amin A, Dori D, Pudil P, Freeman H (eds.), Berlin: Springer, 1998, pp. 640–648.
Bay S D. Combine nearest neighbor classifiers through multiple feature subsets. In Proc. the 15th International Conference on Machine Learning, Madison, MI, 1998, pp. 37–45.
Alkoot F M, Kittler J. Moderating k-NN classifiers. Pattern Analysis & Applications, 2002, 5(3): 326–332.
Zhou Z H, Wu J, Tang W. Ensembling neural networks: Many could be better than all. Artificial Intelligence, 2002, 137(1–2): 239–263.
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported by the National Outstanding Youth Foundation of China under Grant No.60325207, the Fok Ying Tung Education Foundation under Grant No.91067, and the Excellent Young Teachers Program of MOE of China.
Zhi-Hua Zhou received the B.Sc., M.Sc., and Ph.D. degrees in computer science from Nanjing University, China, in 1996, 1998, and 2000, respectively, all with the highest honor. He joined the Department of Computer Science & Technology of Nanjing University as a lecturer in 2001, and is a professor and leader of the LAMDA group at present. His research interests are in machine learning, data mining, pattern recognition, information retrieval, neural computing, and evolutionary computing. In these areas he has published over 40 technical papers in refereed international journals or conference proceedings. He has won the Microsoft Fellowship Award (1999), the National Excellent Doctoral Dissertation Award of China (2003), and award of the National Outstanding Youth Foundation of China (2004). He is on the editorial boards of Artificial Intelligence in Medicine (Elsevier), Knowledge and Information Systems (Springer), and International Journal of Data Warehousing and Mining (Idea Group). He served as the organising chair of the 7th Chinese Workshop on Machine Learning (2000), program co-chair of the 9th Chinese Conference on Machine Learning (2004), and program committee member for numerous international conferences. He is the vice chair of the Artificial Intelligence & Pattern Recognition Society of China Computer Federation, a councilor of Chinese Association of Artificial Intelligence (CAAI), the chief secretary of CAAI Machine Learning Society, and a member of IEEE and IEEE Computer Society.
Yang Yu received the B.Sc. degree in computer science from Nanjing University, China, in 2004. He has won some awards such as China Computer World Scholarship (2004) and Scholarship for outstanding undergraduates. Now he is a member of the LAMDA group and will pursue his M.Sc. degree at the Department of Computer Science & Technology of Nanjing University since September 2005, to be supervised by Prof. Zhi-Hua Zhou. His research interests are in machine learning and evolutionary computing.
Rights and permissions
About this article
Cite this article
Zhou, ZH., Yu, Y. Adapt Bagging to Nearest Neighbor Classifiers. J Comput Sci Technol 20, 48–54 (2005). https://doi.org/10.1007/s11390-005-0005-5
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/s11390-005-0005-5