Active learning based on coupled KNN pseudo pruning

Xiong, Lin; Jiao, L. C.; Mao, Shasha; Zhang, Li

doi:10.1007/s00521-011-0611-9

Active learning based on coupled KNN pseudo pruning

Original Article
Published: 03 May 2011

Volume 21, pages 1669–1686, (2012)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Lin Xiong¹,
L. C. Jiao¹,
Shasha Mao¹ &
…
Li Zhang¹

407 Accesses
4 Citations
Explore all metrics

Abstract

It is very expensive and time-consuming to annotate huge amounts of data. Active learning would be a suitable approach to minimize the effort of annotation. A novel active learning approach, coupled K nearest neighbor pseudo pruning (CKNNPP), is proposed in the paper, which is based on querying examples by KNNPP method. The KNNPP method applies k nearest neighbor technique to search for k neighbor samples from labeled samples of unlabeled samples. When k labeled samples are not belong to the same class, the corresponded unlabeled sample is queried and given its right label by supervisor, and then it is added to labeled training set. In contrast with the previous depiction, the unlabeled sample is not selected and pruned, that is the pseudo pruning. This definition is enlightened from the K nearest neighbor pruning preprocessing. These samples selected by KNNPP are considered to be near or on the optimal classification hyperplane that is crucial for active learning. Especially, in order to avoid the excursion of the optimal classification hyperplane after adding a queried sample, CKNNPP method is proposed finally that two samples with different class label (like a couple, annotated by supervisor) are queried by KNNPP and added in the training set simultaneously for updating training set in each iteration. The CKNNPP can provide a good performance, and especially it is simple, effective, and robust, and can solve the classification problem with unbalanced dataset compared with the existing methods. Then, the computational complexity of CKNNPP is analyzed. Additionally, a new stopping criterion is applied in the proposed method, and the classifier is implemented by Lagrangian Support Vector Machines in iterations of active learning. Finally, twelve UCI datasets, image datasets of aircrafts, and the dataset of radar high-resolution range profile are used to validate the feasibility and effectiveness of the proposed method. The results illuminate that CKNNPP gains superior performance compared with the other seven state-of-the-art active learning approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tri-partition cost-sensitive active learning through kNN

Article 11 October 2017

Adaptive active learning through k-nearest neighbor optimized local density clustering

Article 04 November 2022

An Active Learning Based on Uncertainty and Density Method for Positive and Unlabeled Data

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Cohn D, Ghahramani Z, Jordan MI (1996) Active learning with statistical models. J Artif Intell Res 4:129–145
MATH Google Scholar
Lewis DD, Catlett J (1994) Heterogeneous uncertainty sampling for supervised learning. In: Proceedings of 11th international conference on machine learning, pp 148–156
Li M, Sethi IK (2006) Confidence-based active learning. IEEE Trans Pattern Anal Mach Intell 28(8):1251–1261
Article Google Scholar
Vapnik V (1998) Statistical learning theory. Wiley, New York
Vapnik V (2000) The nature of statistical learning theory, second edn. Springer, New York
Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167
Article Google Scholar
Fukumizu K (2000) Statistical active learning in multilayer perceptrons. IEEE Trans Neural Netw 11(1):17–26
Article Google Scholar
Freund Y, Seung HS, Shamir E, Tishby N (1997) Selective sampling using the query by committee algorithm. Mach Learn 28:133–168
Article MATH Google Scholar
Juszczak P, RPW Duin (2003) Uncertainty sampling methods for one-class classifiers. In: Proceedings of the ICML Workshop
Luo T, Kramer K, Goldgof DD, Hall LO, Samson S, Remsen A, Hopkins T (2005) Active learning to recognize multiple types of plankton. J Mach Learn Res 6:589–612
MathSciNet MATH Google Scholar
Ho S-S, Wechsler H (2008) Query by transduction. IEEE Trans Pattern Anal Mach Intell 30(9):1557–1571
Article Google Scholar
Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J Mach Learn Res 2:45–66
Google Scholar
Schohn G, Cohn D (2000) Less is more: active learning with support vector machines. In: Proceedings of 17th international conference of machine learning
Campbell C, Cristianini N, Smola A (2000) Query learning with large margin classifiers. In: Proceedings of 17th international conference of machine learning
Mangasarian OL, Musicant DR (2001) Lagrangian support vector machines. J Mach Learn Res 1(2001):161–177
MathSciNet MATH Google Scholar
Brinker K (2004) Active learning with Kernel machines. Dissertation in Computer Science, University of Paderborn
Duda RO, Hart PE, Stork DG (2000) Pattern classification, second edn. Wiley, New York
Blake CL, Merz CJ (1998) UCI repository of machine learning databases. Dept. of Information and Computer Science, Univ. of California, Irvine. http://www.ics.uci.edu/mlearn/MLRepository.html
Cristianini N, Shawe-Taylor J (2005) An introduction to support vector machines and other Kernel-based learning methods
Shigeo A, Inoue T (2002) Fuzzy support vector machines for multiclass problems. In: Proceedings of ESANN’2002, pp 113–118
Dasgupta S, Hsu D (2008) Hierarchical sampling for active learning. In Proceedings of the 25th international conference on machine learning, pp 208–215
Tur G, Schapire RE, Hakkani-Tur D (2003) Active learning for spoken language understanding. In: Proceedings of IEEE international conference of acoustics, speech and signal processing
Abe N, Mamitsuka H (1998) Query learning strategies using boosting and bagging. In: Proceedings of 15th international conference of machine learning, pp 1–9
Cauwenberghs G, Poggio T (2000) Incremental support vector machine learning. Adv Neural Inf Process Syst 13:409–415. MIT Press
Google Scholar
Huang S-J, Jin R, Zhou Z-H (2010) Active learning by querying informative and representative examples. In: Lafferty J, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A (eds) Advances in neural information processing systems 23 (NIPS’10) (Vancouver, Canada). MIT Press, Cambridge, pp 892–900

Download references

Acknowledgments

The authors thank Wei-Da Zhou who gives some advice to our method in Sect. 4. We also acknowledge Lie-feng Bo who makes many suggestions about this paper. At last, we wish to acknowledge the anonymous reviewers, whose comments helped to improve our paper. Lin Xiong and Shasha Mao acknowledge the support of the National Natural Science Foundation of China under Grant Nos. 60702062, 60970067, and 60803097.

Author information

Authors and Affiliations

Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education of China, Institute of Intelligent Information Processing, Xidian University, 710071, Xi’an, People’s Republic of China
Lin Xiong, L. C. Jiao, Shasha Mao & Li Zhang

Authors

Lin Xiong
View author publications
You can also search for this author inPubMed Google Scholar
L. C. Jiao
View author publications
You can also search for this author inPubMed Google Scholar
Shasha Mao
View author publications
You can also search for this author inPubMed Google Scholar
Li Zhang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Lin Xiong.

Appendix

1.1 Binary-class dataset for KNNPP

See Tables 10 and 11.

Table 10 Win/Tie/Loss counts of KNNPP versus the other methods with various numbers of queries based on paired t tests at 95% significance level

Full size table

Table 11 Win/Tie/Loss counts of KNNPP versus the other methods with various numbers of queries based on Wilcoxon signed ranks test at 95% significance level

Full size table

1.2 Multiclass dataset for KNNPP

See Tables 12 and 13.

Table 12 Win/Tie/Loss counts of KNNPP versus the other methods with various numbers of queries based on paired t tests at 95% significance level

Full size table

Table 13 Win/Tie/Loss counts of KNNPP versus the others with various numbers of queries based on Wilcoxon signed ranks test at 95% significance level

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiong, L., Jiao, L.C., Mao, S. et al. Active learning based on coupled KNN pseudo pruning. Neural Comput & Applic 21, 1669–1686 (2012). https://doi.org/10.1007/s00521-011-0611-9

Download citation

Received: 31 August 2010
Accepted: 15 April 2011
Published: 03 May 2011
Issue Date: October 2012
DOI: https://doi.org/10.1007/s00521-011-0611-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Active learning based on coupled KNN pseudo pruning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Tri-partition cost-sensitive active learning through kNN

Adaptive active learning through k-nearest neighbor optimized local density clustering

An Active Learning Based on Uncertainty and Density Method for Positive and Unlabeled Data

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Binary-class dataset for KNNPP

1.2 Multiclass dataset for KNNPP

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now