Abstract
It is very expensive and time-consuming to annotate huge amounts of data. Active learning would be a suitable approach to minimize the effort of annotation. A novel active learning approach, coupled K nearest neighbor pseudo pruning (CKNNPP), is proposed in the paper, which is based on querying examples by KNNPP method. The KNNPP method applies k nearest neighbor technique to search for k neighbor samples from labeled samples of unlabeled samples. When k labeled samples are not belong to the same class, the corresponded unlabeled sample is queried and given its right label by supervisor, and then it is added to labeled training set. In contrast with the previous depiction, the unlabeled sample is not selected and pruned, that is the pseudo pruning. This definition is enlightened from the K nearest neighbor pruning preprocessing. These samples selected by KNNPP are considered to be near or on the optimal classification hyperplane that is crucial for active learning. Especially, in order to avoid the excursion of the optimal classification hyperplane after adding a queried sample, CKNNPP method is proposed finally that two samples with different class label (like a couple, annotated by supervisor) are queried by KNNPP and added in the training set simultaneously for updating training set in each iteration. The CKNNPP can provide a good performance, and especially it is simple, effective, and robust, and can solve the classification problem with unbalanced dataset compared with the existing methods. Then, the computational complexity of CKNNPP is analyzed. Additionally, a new stopping criterion is applied in the proposed method, and the classifier is implemented by Lagrangian Support Vector Machines in iterations of active learning. Finally, twelve UCI datasets, image datasets of aircrafts, and the dataset of radar high-resolution range profile are used to validate the feasibility and effectiveness of the proposed method. The results illuminate that CKNNPP gains superior performance compared with the other seven state-of-the-art active learning approaches.















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Cohn D, Ghahramani Z, Jordan MI (1996) Active learning with statistical models. J Artif Intell Res 4:129–145
Lewis DD, Catlett J (1994) Heterogeneous uncertainty sampling for supervised learning. In: Proceedings of 11th international conference on machine learning, pp 148–156
Li M, Sethi IK (2006) Confidence-based active learning. IEEE Trans Pattern Anal Mach Intell 28(8):1251–1261
Vapnik V (1998) Statistical learning theory. Wiley, New York
Vapnik V (2000) The nature of statistical learning theory, second edn. Springer, New York
Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167
Fukumizu K (2000) Statistical active learning in multilayer perceptrons. IEEE Trans Neural Netw 11(1):17–26
Freund Y, Seung HS, Shamir E, Tishby N (1997) Selective sampling using the query by committee algorithm. Mach Learn 28:133–168
Juszczak P, RPW Duin (2003) Uncertainty sampling methods for one-class classifiers. In: Proceedings of the ICML Workshop
Luo T, Kramer K, Goldgof DD, Hall LO, Samson S, Remsen A, Hopkins T (2005) Active learning to recognize multiple types of plankton. J Mach Learn Res 6:589–612
Ho S-S, Wechsler H (2008) Query by transduction. IEEE Trans Pattern Anal Mach Intell 30(9):1557–1571
Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J Mach Learn Res 2:45–66
Schohn G, Cohn D (2000) Less is more: active learning with support vector machines. In: Proceedings of 17th international conference of machine learning
Campbell C, Cristianini N, Smola A (2000) Query learning with large margin classifiers. In: Proceedings of 17th international conference of machine learning
Mangasarian OL, Musicant DR (2001) Lagrangian support vector machines. J Mach Learn Res 1(2001):161–177
Brinker K (2004) Active learning with Kernel machines. Dissertation in Computer Science, University of Paderborn
Duda RO, Hart PE, Stork DG (2000) Pattern classification, second edn. Wiley, New York
Blake CL, Merz CJ (1998) UCI repository of machine learning databases. Dept. of Information and Computer Science, Univ. of California, Irvine. http://www.ics.uci.edu/mlearn/MLRepository.html
Cristianini N, Shawe-Taylor J (2005) An introduction to support vector machines and other Kernel-based learning methods
Shigeo A, Inoue T (2002) Fuzzy support vector machines for multiclass problems. In: Proceedings of ESANN’2002, pp 113–118
Dasgupta S, Hsu D (2008) Hierarchical sampling for active learning. In Proceedings of the 25th international conference on machine learning, pp 208–215
Tur G, Schapire RE, Hakkani-Tur D (2003) Active learning for spoken language understanding. In: Proceedings of IEEE international conference of acoustics, speech and signal processing
Abe N, Mamitsuka H (1998) Query learning strategies using boosting and bagging. In: Proceedings of 15th international conference of machine learning, pp 1–9
Cauwenberghs G, Poggio T (2000) Incremental support vector machine learning. Adv Neural Inf Process Syst 13:409–415. MIT Press
Huang S-J, Jin R, Zhou Z-H (2010) Active learning by querying informative and representative examples. In: Lafferty J, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A (eds) Advances in neural information processing systems 23 (NIPS’10) (Vancouver, Canada). MIT Press, Cambridge, pp 892–900
Acknowledgments
The authors thank Wei-Da Zhou who gives some advice to our method in Sect. 4. We also acknowledge Lie-feng Bo who makes many suggestions about this paper. At last, we wish to acknowledge the anonymous reviewers, whose comments helped to improve our paper. Lin Xiong and Shasha Mao acknowledge the support of the National Natural Science Foundation of China under Grant Nos. 60702062, 60970067, and 60803097.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Binary-class dataset for KNNPP
1.2 Multiclass dataset for KNNPP
Rights and permissions
About this article
Cite this article
Xiong, L., Jiao, L.C., Mao, S. et al. Active learning based on coupled KNN pseudo pruning. Neural Comput & Applic 21, 1669–1686 (2012). https://doi.org/10.1007/s00521-011-0611-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-011-0611-9