Skip to main content
Log in

Active learning based on coupled KNN pseudo pruning

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

It is very expensive and time-consuming to annotate huge amounts of data. Active learning would be a suitable approach to minimize the effort of annotation. A novel active learning approach, coupled K nearest neighbor pseudo pruning (CKNNPP), is proposed in the paper, which is based on querying examples by KNNPP method. The KNNPP method applies k nearest neighbor technique to search for k neighbor samples from labeled samples of unlabeled samples. When k labeled samples are not belong to the same class, the corresponded unlabeled sample is queried and given its right label by supervisor, and then it is added to labeled training set. In contrast with the previous depiction, the unlabeled sample is not selected and pruned, that is the pseudo pruning. This definition is enlightened from the K nearest neighbor pruning preprocessing. These samples selected by KNNPP are considered to be near or on the optimal classification hyperplane that is crucial for active learning. Especially, in order to avoid the excursion of the optimal classification hyperplane after adding a queried sample, CKNNPP method is proposed finally that two samples with different class label (like a couple, annotated by supervisor) are queried by KNNPP and added in the training set simultaneously for updating training set in each iteration. The CKNNPP can provide a good performance, and especially it is simple, effective, and robust, and can solve the classification problem with unbalanced dataset compared with the existing methods. Then, the computational complexity of CKNNPP is analyzed. Additionally, a new stopping criterion is applied in the proposed method, and the classifier is implemented by Lagrangian Support Vector Machines in iterations of active learning. Finally, twelve UCI datasets, image datasets of aircrafts, and the dataset of radar high-resolution range profile are used to validate the feasibility and effectiveness of the proposed method. The results illuminate that CKNNPP gains superior performance compared with the other seven state-of-the-art active learning approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Cohn D, Ghahramani Z, Jordan MI (1996) Active learning with statistical models. J Artif Intell Res 4:129–145

    MATH  Google Scholar 

  2. Lewis DD, Catlett J (1994) Heterogeneous uncertainty sampling for supervised learning. In: Proceedings of 11th international conference on machine learning, pp 148–156

  3. Li M, Sethi IK (2006) Confidence-based active learning. IEEE Trans Pattern Anal Mach Intell 28(8):1251–1261

    Article  Google Scholar 

  4. Vapnik V (1998) Statistical learning theory. Wiley, New York

  5. Vapnik V (2000) The nature of statistical learning theory, second edn. Springer, New York

  6. Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167

    Article  Google Scholar 

  7. Fukumizu K (2000) Statistical active learning in multilayer perceptrons. IEEE Trans Neural Netw 11(1):17–26

    Article  Google Scholar 

  8. Freund Y, Seung HS, Shamir E, Tishby N (1997) Selective sampling using the query by committee algorithm. Mach Learn 28:133–168

    Article  MATH  Google Scholar 

  9. Juszczak P, RPW Duin (2003) Uncertainty sampling methods for one-class classifiers. In: Proceedings of the ICML Workshop

  10. Luo T, Kramer K, Goldgof DD, Hall LO, Samson S, Remsen A, Hopkins T (2005) Active learning to recognize multiple types of plankton. J Mach Learn Res 6:589–612

    MathSciNet  MATH  Google Scholar 

  11. Ho S-S, Wechsler H (2008) Query by transduction. IEEE Trans Pattern Anal Mach Intell 30(9):1557–1571

    Article  Google Scholar 

  12. Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J Mach Learn Res 2:45–66

    Google Scholar 

  13. Schohn G, Cohn D (2000) Less is more: active learning with support vector machines. In: Proceedings of 17th international conference of machine learning

  14. Campbell C, Cristianini N, Smola A (2000) Query learning with large margin classifiers. In: Proceedings of 17th international conference of machine learning

  15. Mangasarian OL, Musicant DR (2001) Lagrangian support vector machines. J Mach Learn Res 1(2001):161–177

    MathSciNet  MATH  Google Scholar 

  16. Brinker K (2004) Active learning with Kernel machines. Dissertation in Computer Science, University of Paderborn

  17. Duda RO, Hart PE, Stork DG (2000) Pattern classification, second edn. Wiley, New York

  18. Blake CL, Merz CJ (1998) UCI repository of machine learning databases. Dept. of Information and Computer Science, Univ. of California, Irvine. http://www.ics.uci.edu/mlearn/MLRepository.html

  19. Cristianini N, Shawe-Taylor J (2005) An introduction to support vector machines and other Kernel-based learning methods

  20. Shigeo A, Inoue T (2002) Fuzzy support vector machines for multiclass problems. In: Proceedings of ESANN’2002, pp 113–118

  21. Dasgupta S, Hsu D (2008) Hierarchical sampling for active learning. In Proceedings of the 25th international conference on machine learning, pp 208–215

  22. Tur G, Schapire RE, Hakkani-Tur D (2003) Active learning for spoken language understanding. In: Proceedings of IEEE international conference of acoustics, speech and signal processing

  23. Abe N, Mamitsuka H (1998) Query learning strategies using boosting and bagging. In: Proceedings of 15th international conference of machine learning, pp 1–9

  24. Cauwenberghs G, Poggio T (2000) Incremental support vector machine learning. Adv Neural Inf Process Syst 13:409–415. MIT Press

    Google Scholar 

  25. Huang S-J, Jin R, Zhou Z-H (2010) Active learning by querying informative and representative examples. In: Lafferty J, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A (eds) Advances in neural information processing systems 23 (NIPS’10) (Vancouver, Canada). MIT Press, Cambridge, pp 892–900

Download references

Acknowledgments

The authors thank Wei-Da Zhou who gives some advice to our method in Sect. 4. We also acknowledge Lie-feng Bo who makes many suggestions about this paper. At last, we wish to acknowledge the anonymous reviewers, whose comments helped to improve our paper. Lin Xiong and Shasha Mao acknowledge the support of the National Natural Science Foundation of China under Grant Nos. 60702062, 60970067, and 60803097.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lin Xiong.

Appendix

Appendix

1.1 Binary-class dataset for KNNPP

See Tables 10 and 11.

Table 10 Win/Tie/Loss counts of KNNPP versus the other methods with various numbers of queries based on paired t tests at 95% significance level
Table 11 Win/Tie/Loss counts of KNNPP versus the other methods with various numbers of queries based on Wilcoxon signed ranks test at 95% significance level

1.2 Multiclass dataset for KNNPP

See Tables 12 and 13.

Table 12 Win/Tie/Loss counts of KNNPP versus the other methods with various numbers of queries based on paired t tests at 95% significance level
Table 13 Win/Tie/Loss counts of KNNPP versus the others with various numbers of queries based on Wilcoxon signed ranks test at 95% significance level

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiong, L., Jiao, L.C., Mao, S. et al. Active learning based on coupled KNN pseudo pruning. Neural Comput & Applic 21, 1669–1686 (2012). https://doi.org/10.1007/s00521-011-0611-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-011-0611-9

Keywords

Navigation