Skip to main content
Log in

A boosting Self-Training Framework based on Instance Generation with Natural Neighbors for K Nearest Neighbor

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The semi-supervised self-training method is one of the successful methodologies of semi-supervised classification. The mislabeling is the most challenging issue in self-training methods and the ensemble learning is one of the common techniques for dealing with the mislabeling. Specifically, the ensemble learning can solve or alleviate the mislabeling by constructing an ensemble classifier to improve prediction accuracy in the self-training process. However, most ensemble learning methods may not perform well in self-training methods because it is difficult for ensemble learning methods to train an effective ensemble classifier with a small number of labeled data. Inspired by the successful boosting methods, we introduce a new boosting self-training framework based on instance generation with natural neighbors (BoostSTIG) in this paper. BoostSTIG is compatible with most boosting methods and self-training methods. It can use most boosting methods to solve or alleviate the mislabeling of existing self-training methods by improving the prediction accuracy in the self-training process. Besides, an instance generation with natural neighbors is proposed to enlarge initial labeled data in BoostSTIG, which makes boosting methods more suitable for self-training methods. In experiments, we apply the BoostSTIG framework to 2 self-training methods and 4 boosting methods, and then validate BoostSTIG by comparing some state-of-the-art technologies on real data sets. Intensive experiments show that BoostSTIG can improve the performance of tested self-training methods and train an effective k nearest neighbor.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. http://archive.ics.uci.edu/ml/datasets.html.

References

  1. Happy SL, Dantcheva A, Bremond F (2019) A Weakly Supervised learning technique for classifying facial expressions. Pattern Recognition Letters 128(1):162–168

    Google Scholar 

  2. Song Y, Upadhyay S, Peng H, Mayhew S, Roth D (2019) Toward any-language zero-shot topic classification of textual documents. Artif Intell 274:33–150

    MathSciNet  MATH  Google Scholar 

  3. Ahmed Ghoneim, Ghulam Muhammad, M. Shamim Hossain, Cervical cancer classification using convolutional neural networks and extreme learning machines, Future Generation Computer Systems 102 (2020) 643–649

  4. Abayomi-Alli O, Misra S, Abayomi-Alli A, Odusami M (2019) A review of soft techniques for SMS spam classification: methods, approaches and applications. Eng Appl Artif Intell 86:197–212

    Google Scholar 

  5. Adcock CJ, Meade N (2017) Using parametric classification trees for model selection with applications to financial risk management. European Journal of Operational Research 259(2):746–765

    MathSciNet  MATH  Google Scholar 

  6. Liu C, Wang J, Duan S, Xu Y (2019) Combining dissimilarity measures for image classification. Pattern Recogn Lett 128(1):536–543

    Google Scholar 

  7. Chen X, Yu G, Tan Q, Wang J (2019) Weighted samples based semi-supervised classification. Appl Soft Comput 79:46–58

    Google Scholar 

  8. Xie Y, Zhang J, Xia Y (2019) Semi-supervised adversarial model for benign-malignant lung nodule classification on chest CT. Med Image Anal 57:237–248

    Google Scholar 

  9. Rossi RG, de Andrade Lopes A, Rezende SO (2016) Optimization and label propagation in bipartite heterogeneous networks to improve transductive classification of texts. Information Processing & Management 52(2):217–257

    Google Scholar 

  10. Zhang Z, Jia L, Zhao M, Ye Q, Zhang M, Wang M (2018) Adaptive non-negative projective semi-supervised learning for inductive classification. Neural Netw 108:128–145

    MATH  Google Scholar 

  11. Li Q, Liu W, Li L (2019) Self-reinforced diffusion for graph-based semi-supervised learning. Pattern Recogn Lett 125(1):439–445

    Google Scholar 

  12. Sheikhpour R, Sarram MA, Sheikhpour E (2018) Semi-supervised sparse feature selection via graph Laplacian based scatter matrix for regression problems. Information Sciences 468:14–28

    MATH  Google Scholar 

  13. Zhan Y, Bai Y, Zhang W, Ying S (2018) A P-ADMM for sparse quadratic kernel-free least squares semi-supervised support vector machine. Neurocomputing 306(6):37–50

    Google Scholar 

  14. Hu T, Huang X, Li J, Zhang L (2018) A novel co-training approach for urban land cover mapping with unclear Landsat time series imagery. Remote Sens Environ 217:144–157

    Google Scholar 

  15. Liu B, Feng J, Liu M, Hu H, Wang X (2015) Predicting the quality of user-generated answers using co-training in community-based question answering portals. Pattern Recogn Lett 58(1):29–34

    Google Scholar 

  16. Tanha J, Van Someren M, Afsarmanesh H (2017) Semi-supervised self-training for decision tree classifiers. Int J Mach Learn Cybern 8(1):355–370

    Google Scholar 

  17. Karliane M. O. Vale, Anne Magály P. Canuto, Araken Medeiros Santos, Flavius L. Gorgônio, Alan de M. Tavares, Arthur Gorgnio, Cainan Alves, Automatic Adjustment of Confidence Values in Self-training Semi-supervised Method, 2018 International Joint Conference on Neural Networks (IJCNN), 2018, pp. 1–8

  18. Wu D, Shang MS, Luo X, Xu J, Yan HY, Deng WH, Wang GY (2018) Self-training semi-supervised classification based on density peaks of data. Neurocomputing 275(31):180–191

    Google Scholar 

  19. Hajmohammadi MS, Ibrahim R (2015) Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples. Inf Sci 317(1):67–77

    Google Scholar 

  20. Shi L, Ma X, Xi L, Duan Q, Zhao J (2011) Rough set and ensemble learning based semi-supervised algorithm for text classification. Expert Syst Appl 38(5):6300–6306

    Google Scholar 

  21. Vo DT, Bagheri E (2017) Self-training on refined clause patterns for relation extraction. Inf Process Manag 54(4):686–706

    Google Scholar 

  22. Dalva D, Guz U, Gurkan H (2018) Effective semi-supervised learning strategies for automatic sentence segmentation. Pattern Recogn Lett 105(1):76–86

    Google Scholar 

  23. Le THN, Luu K, Zhu C, Savvides M (2017) Semi self-training beard/moustache detection and segmentation simultaneously. Image & Vision Computing 58:214–223

    Google Scholar 

  24. Xia CQ, Han K, Qi Y, Zhang Y, Yu DJ (2018) A self-training subspace clustering algorithm under low-rank representation for cancer classification on gene expression data. IEEE/ACM Transactions on Computational Biology and Bioinformatics 15(4):1315–1324

    Google Scholar 

  25. Li M, Zhou ZH (2005) SETRED: Self-training with editing, Pacific-asia Conference on Advances in Knowledge Discovery & Data Mining, pp. 611–621

  26. Wang Y, Xu X, Zhao H, Hua Z (2010) Semi-supervised learning based on nearest neighbor rule and cut edges. Knowl-Based Syst 23(6):547–554

    Google Scholar 

  27. Adankon MM, Cheriet M (2011) Help-training for semi-supervised support vector machines. Pattern Recogn 44(9):2220–2230

    Google Scholar 

  28. Wei Z, Wang H, Zhao R (2013) Semi-supervised multi-label image classification based on nearest neighbor editing. Neurocomputing 119(7):462–468

    Google Scholar 

  29. Gan H, Sang N, Huang R, Tong X, Dan Z (2013) Using clustering analysis to improve semi-supervised classification. Neurocomputing 101(4):290–298

    Google Scholar 

  30. Triguero I, Sáez AJ, Luengo J, García S, Herrera F (2014) On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification. Neurocomputing 132(20):30–41

    Google Scholar 

  31. Levatić J, Ceci M, Kocev D, Džeroski S (2017) Self-training for multi-target regression with tree ensembles. Knowl-Based Syst 123(1):41–60

    Google Scholar 

  32. Wu D, Shang MS, Wang GY, Li L (2018) A Self-Training Semi-Supervised Classification Algorithm Based on Density Peaks of Data and Differential Evolution, 2018 IEEE 15th international conference on networking, Sensing and Control (ICNSC), pp 1–6

  33. Li J, Zhu Q (2019) Semi-supervised self-training method based on an optimum-path forest. IEEE Access 7:36388–36399

    Google Scholar 

  34. Li J, Zhu Q, Wu Q (2019) A self-training method based on density peaks and an extended parameter-free local noise filter for k nearest neighbor, Knowledge-Based Systems 31

  35. Ribeiro FDS, Calivá F, Swainson M, Gudmundsson K, Leontidis G, Kollias S (2019) Deep Bayesian self-training. Neural Comput & Applic 3:1–17

    Google Scholar 

  36. Liu J, Zhao S, Wang G (2018) SSEL-ADE: a semi-supervised ensemble learning framework for extracting adverse drug events from social media. Artif Intell Med 84:34–49

    Google Scholar 

  37. Freund Y, Schapire R (1996) Experiments with a new boosting algorithm, in: Proc. of the Thirteenth International Conference on Machine Learning, pp. 148–156

  38. García-Pedrajas N, de Haro-García A (2014) Boosting instance selection algorithms. Knowl-Based Syst 67:342–360

    Google Scholar 

  39. Li Y, Qi L, Tan S (2016) Improved semi-supervised online boosting for object tracking, International Symposium on Optoelectronic Technology and Application 2016

  40. Fazakis N,Kostopoulos G, Karlos S, Kotsiantis S, Sgarbas K (2019) Self-trained extreme gradient boosting trees, 2019 10th international conference on information, Intelligence, Systems and Applications (IISA)

  41. Triguero I, Garcia S, Herrera F (2015) Seg-ssc: a framework based on synthetic examples generation for self-labeled semi-supervised classification. IEEE Transactions on Cybernetics 45(4):622–634

    Google Scholar 

  42. Zhu Q, Feng J, Huang J (2016) Natural neighbor: a self-adaptive neighborhood method without parameter k. Pattern Recogn Lett 80(1):30–36

    Google Scholar 

  43. Zhang Y, Sakhanenko L (2019) The naive Bayes classifier for functional data. Statistics & Probability Letters 152:137–146

    MathSciNet  MATH  Google Scholar 

  44. Yin X, Shu T, Huang Q (2012) Semi-supervised fuzzy clustering with metric learning and entropy regularization. Knowl-Based Syst 35:304–311

    Google Scholar 

  45. Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344:1492–1496

    Google Scholar 

  46. Xu S, Zhang C, Zhang J (2020) Bayesian deep matrix factorization network for multiple images denoising, Neural Networks (123) 420–428

  47. Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421

    MathSciNet  MATH  Google Scholar 

  48. Breiman L (2001) Random forests, Machine Learning (45) 5–32

  49. Grabner H (2006) On-line boosting and vision. IEEE Computer Society Conference on Computer Vision & Pattern Recognition, pp. 260–267

  50. Chakraborty D, Elzarka H (2019) Early detection of faults in HVAC systems using an XGBoost model with a dynamic threshold. Energy and BuildingsVolume 185(15):326–344

    Google Scholar 

  51. Macedo M, Apolinário A (2018) Improved anti-aliasing for Euclidean distance transform shadow mapping, Computers & GraphicsVolume (71) 166–179

  52. Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27

    MATH  Google Scholar 

  53. Benetis R, Jensen CS, Karciauskas G, Saltenis S (2002) Nearest neighbor and reverse nearest neighbor queries for moving objects. Proceedings International Database Engineering and Applications Symposium 15(3):229–249

    Google Scholar 

  54. Cheng D, Zhu Q, Huang J, Yang L, Wu Q (2017) Natural neighbor-based clustering algorithm with local representatives. Knowl-Based Syst 123(1):238–253

    Google Scholar 

  55. Cheng D, Zhu Q, Huang J, Wu Q, Yang L (2018) A local cores-based hierarchical clustering algorithm for data sets with complex structures. Neural Comput & Applic 5:1–18

    Google Scholar 

  56. Huang J, Zhu Q, Yang L, Feng J (2016) A non-parameter outlier detection algorithm based on natural neighbor. Knowl-Based Syst 92(15):71–77

    Google Scholar 

  57. Yang L, Zhu Q, Huang J, Cheng D (2017) Adaptive edited natural neighbor algorithm. Neurocomputing 230(22):427–433

    Google Scholar 

  58. Yang L, Zhu Q, Huang J, Cheng D, Wu Q, Hong X (2018) Natural neighborhood graph-based instance reduction algorithm without parameters. Appl Soft Comput 70:279–287

    Google Scholar 

  59. Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517

    MathSciNet  MATH  Google Scholar 

  60. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357

    MATH  Google Scholar 

  61. Storn RM, Price K (1995) Differential evolution: a simple and efficient adaptive scheme for global optimization over continuous spaces. JGlobal Optim 23(1):341–359

    MATH  Google Scholar 

  62. Friedman JH, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting. Ann Stat 28(2):337–407

    MathSciNet  MATH  Google Scholar 

  63. C. Domingo, O. Watanabe (2000) MadaBoost: A Modification of AdaBoost, Proceeding COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, pp. 180–189

  64. Webb GI (2000) Multiboosting: a technique for combining boosting and wagging. Mach Learn 40(2):159–196

    Google Scholar 

  65. Rodríguez JJ, Maudes J (2008) Boosting recombined weak classifiers. Pattern Recogn Lett 29:1049–1059

    Google Scholar 

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (61272194 and 61502060) and the Project of Chongqing Natural Science Foundation (cstc2019jcyj-msxmX0683).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qingsheng Zhu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., Zhu, Q. A boosting Self-Training Framework based on Instance Generation with Natural Neighbors for K Nearest Neighbor. Appl Intell 50, 3535–3553 (2020). https://doi.org/10.1007/s10489-020-01732-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-01732-1

Keywords

Navigation