Abstract
The semi-supervised self-training method is one of the successful methodologies of semi-supervised classification. The mislabeling is the most challenging issue in self-training methods and the ensemble learning is one of the common techniques for dealing with the mislabeling. Specifically, the ensemble learning can solve or alleviate the mislabeling by constructing an ensemble classifier to improve prediction accuracy in the self-training process. However, most ensemble learning methods may not perform well in self-training methods because it is difficult for ensemble learning methods to train an effective ensemble classifier with a small number of labeled data. Inspired by the successful boosting methods, we introduce a new boosting self-training framework based on instance generation with natural neighbors (BoostSTIG) in this paper. BoostSTIG is compatible with most boosting methods and self-training methods. It can use most boosting methods to solve or alleviate the mislabeling of existing self-training methods by improving the prediction accuracy in the self-training process. Besides, an instance generation with natural neighbors is proposed to enlarge initial labeled data in BoostSTIG, which makes boosting methods more suitable for self-training methods. In experiments, we apply the BoostSTIG framework to 2 self-training methods and 4 boosting methods, and then validate BoostSTIG by comparing some state-of-the-art technologies on real data sets. Intensive experiments show that BoostSTIG can improve the performance of tested self-training methods and train an effective k nearest neighbor.
Similar content being viewed by others
References
Happy SL, Dantcheva A, Bremond F (2019) A Weakly Supervised learning technique for classifying facial expressions. Pattern Recognition Letters 128(1):162–168
Song Y, Upadhyay S, Peng H, Mayhew S, Roth D (2019) Toward any-language zero-shot topic classification of textual documents. Artif Intell 274:33–150
Ahmed Ghoneim, Ghulam Muhammad, M. Shamim Hossain, Cervical cancer classification using convolutional neural networks and extreme learning machines, Future Generation Computer Systems 102 (2020) 643–649
Abayomi-Alli O, Misra S, Abayomi-Alli A, Odusami M (2019) A review of soft techniques for SMS spam classification: methods, approaches and applications. Eng Appl Artif Intell 86:197–212
Adcock CJ, Meade N (2017) Using parametric classification trees for model selection with applications to financial risk management. European Journal of Operational Research 259(2):746–765
Liu C, Wang J, Duan S, Xu Y (2019) Combining dissimilarity measures for image classification. Pattern Recogn Lett 128(1):536–543
Chen X, Yu G, Tan Q, Wang J (2019) Weighted samples based semi-supervised classification. Appl Soft Comput 79:46–58
Xie Y, Zhang J, Xia Y (2019) Semi-supervised adversarial model for benign-malignant lung nodule classification on chest CT. Med Image Anal 57:237–248
Rossi RG, de Andrade Lopes A, Rezende SO (2016) Optimization and label propagation in bipartite heterogeneous networks to improve transductive classification of texts. Information Processing & Management 52(2):217–257
Zhang Z, Jia L, Zhao M, Ye Q, Zhang M, Wang M (2018) Adaptive non-negative projective semi-supervised learning for inductive classification. Neural Netw 108:128–145
Li Q, Liu W, Li L (2019) Self-reinforced diffusion for graph-based semi-supervised learning. Pattern Recogn Lett 125(1):439–445
Sheikhpour R, Sarram MA, Sheikhpour E (2018) Semi-supervised sparse feature selection via graph Laplacian based scatter matrix for regression problems. Information Sciences 468:14–28
Zhan Y, Bai Y, Zhang W, Ying S (2018) A P-ADMM for sparse quadratic kernel-free least squares semi-supervised support vector machine. Neurocomputing 306(6):37–50
Hu T, Huang X, Li J, Zhang L (2018) A novel co-training approach for urban land cover mapping with unclear Landsat time series imagery. Remote Sens Environ 217:144–157
Liu B, Feng J, Liu M, Hu H, Wang X (2015) Predicting the quality of user-generated answers using co-training in community-based question answering portals. Pattern Recogn Lett 58(1):29–34
Tanha J, Van Someren M, Afsarmanesh H (2017) Semi-supervised self-training for decision tree classifiers. Int J Mach Learn Cybern 8(1):355–370
Karliane M. O. Vale, Anne Magály P. Canuto, Araken Medeiros Santos, Flavius L. Gorgônio, Alan de M. Tavares, Arthur Gorgnio, Cainan Alves, Automatic Adjustment of Confidence Values in Self-training Semi-supervised Method, 2018 International Joint Conference on Neural Networks (IJCNN), 2018, pp. 1–8
Wu D, Shang MS, Luo X, Xu J, Yan HY, Deng WH, Wang GY (2018) Self-training semi-supervised classification based on density peaks of data. Neurocomputing 275(31):180–191
Hajmohammadi MS, Ibrahim R (2015) Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples. Inf Sci 317(1):67–77
Shi L, Ma X, Xi L, Duan Q, Zhao J (2011) Rough set and ensemble learning based semi-supervised algorithm for text classification. Expert Syst Appl 38(5):6300–6306
Vo DT, Bagheri E (2017) Self-training on refined clause patterns for relation extraction. Inf Process Manag 54(4):686–706
Dalva D, Guz U, Gurkan H (2018) Effective semi-supervised learning strategies for automatic sentence segmentation. Pattern Recogn Lett 105(1):76–86
Le THN, Luu K, Zhu C, Savvides M (2017) Semi self-training beard/moustache detection and segmentation simultaneously. Image & Vision Computing 58:214–223
Xia CQ, Han K, Qi Y, Zhang Y, Yu DJ (2018) A self-training subspace clustering algorithm under low-rank representation for cancer classification on gene expression data. IEEE/ACM Transactions on Computational Biology and Bioinformatics 15(4):1315–1324
Li M, Zhou ZH (2005) SETRED: Self-training with editing, Pacific-asia Conference on Advances in Knowledge Discovery & Data Mining, pp. 611–621
Wang Y, Xu X, Zhao H, Hua Z (2010) Semi-supervised learning based on nearest neighbor rule and cut edges. Knowl-Based Syst 23(6):547–554
Adankon MM, Cheriet M (2011) Help-training for semi-supervised support vector machines. Pattern Recogn 44(9):2220–2230
Wei Z, Wang H, Zhao R (2013) Semi-supervised multi-label image classification based on nearest neighbor editing. Neurocomputing 119(7):462–468
Gan H, Sang N, Huang R, Tong X, Dan Z (2013) Using clustering analysis to improve semi-supervised classification. Neurocomputing 101(4):290–298
Triguero I, Sáez AJ, Luengo J, García S, Herrera F (2014) On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification. Neurocomputing 132(20):30–41
Levatić J, Ceci M, Kocev D, Džeroski S (2017) Self-training for multi-target regression with tree ensembles. Knowl-Based Syst 123(1):41–60
Wu D, Shang MS, Wang GY, Li L (2018) A Self-Training Semi-Supervised Classification Algorithm Based on Density Peaks of Data and Differential Evolution, 2018 IEEE 15th international conference on networking, Sensing and Control (ICNSC), pp 1–6
Li J, Zhu Q (2019) Semi-supervised self-training method based on an optimum-path forest. IEEE Access 7:36388–36399
Li J, Zhu Q, Wu Q (2019) A self-training method based on density peaks and an extended parameter-free local noise filter for k nearest neighbor, Knowledge-Based Systems 31
Ribeiro FDS, Calivá F, Swainson M, Gudmundsson K, Leontidis G, Kollias S (2019) Deep Bayesian self-training. Neural Comput & Applic 3:1–17
Liu J, Zhao S, Wang G (2018) SSEL-ADE: a semi-supervised ensemble learning framework for extracting adverse drug events from social media. Artif Intell Med 84:34–49
Freund Y, Schapire R (1996) Experiments with a new boosting algorithm, in: Proc. of the Thirteenth International Conference on Machine Learning, pp. 148–156
García-Pedrajas N, de Haro-García A (2014) Boosting instance selection algorithms. Knowl-Based Syst 67:342–360
Li Y, Qi L, Tan S (2016) Improved semi-supervised online boosting for object tracking, International Symposium on Optoelectronic Technology and Application 2016
Fazakis N,Kostopoulos G, Karlos S, Kotsiantis S, Sgarbas K (2019) Self-trained extreme gradient boosting trees, 2019 10th international conference on information, Intelligence, Systems and Applications (IISA)
Triguero I, Garcia S, Herrera F (2015) Seg-ssc: a framework based on synthetic examples generation for self-labeled semi-supervised classification. IEEE Transactions on Cybernetics 45(4):622–634
Zhu Q, Feng J, Huang J (2016) Natural neighbor: a self-adaptive neighborhood method without parameter k. Pattern Recogn Lett 80(1):30–36
Zhang Y, Sakhanenko L (2019) The naive Bayes classifier for functional data. Statistics & Probability Letters 152:137–146
Yin X, Shu T, Huang Q (2012) Semi-supervised fuzzy clustering with metric learning and entropy regularization. Knowl-Based Syst 35:304–311
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344:1492–1496
Xu S, Zhang C, Zhang J (2020) Bayesian deep matrix factorization network for multiple images denoising, Neural Networks (123) 420–428
Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421
Breiman L (2001) Random forests, Machine Learning (45) 5–32
Grabner H (2006) On-line boosting and vision. IEEE Computer Society Conference on Computer Vision & Pattern Recognition, pp. 260–267
Chakraborty D, Elzarka H (2019) Early detection of faults in HVAC systems using an XGBoost model with a dynamic threshold. Energy and BuildingsVolume 185(15):326–344
Macedo M, Apolinário A (2018) Improved anti-aliasing for Euclidean distance transform shadow mapping, Computers & GraphicsVolume (71) 166–179
Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
Benetis R, Jensen CS, Karciauskas G, Saltenis S (2002) Nearest neighbor and reverse nearest neighbor queries for moving objects. Proceedings International Database Engineering and Applications Symposium 15(3):229–249
Cheng D, Zhu Q, Huang J, Yang L, Wu Q (2017) Natural neighbor-based clustering algorithm with local representatives. Knowl-Based Syst 123(1):238–253
Cheng D, Zhu Q, Huang J, Wu Q, Yang L (2018) A local cores-based hierarchical clustering algorithm for data sets with complex structures. Neural Comput & Applic 5:1–18
Huang J, Zhu Q, Yang L, Feng J (2016) A non-parameter outlier detection algorithm based on natural neighbor. Knowl-Based Syst 92(15):71–77
Yang L, Zhu Q, Huang J, Cheng D (2017) Adaptive edited natural neighbor algorithm. Neurocomputing 230(22):427–433
Yang L, Zhu Q, Huang J, Cheng D, Wu Q, Hong X (2018) Natural neighborhood graph-based instance reduction algorithm without parameters. Appl Soft Comput 70:279–287
Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357
Storn RM, Price K (1995) Differential evolution: a simple and efficient adaptive scheme for global optimization over continuous spaces. JGlobal Optim 23(1):341–359
Friedman JH, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting. Ann Stat 28(2):337–407
C. Domingo, O. Watanabe (2000) MadaBoost: A Modification of AdaBoost, Proceeding COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, pp. 180–189
Webb GI (2000) Multiboosting: a technique for combining boosting and wagging. Mach Learn 40(2):159–196
Rodríguez JJ, Maudes J (2008) Boosting recombined weak classifiers. Pattern Recogn Lett 29:1049–1059
Acknowledgments
This work was supported by the National Natural Science Foundation of China (61272194 and 61502060) and the Project of Chongqing Natural Science Foundation (cstc2019jcyj-msxmX0683).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, J., Zhu, Q. A boosting Self-Training Framework based on Instance Generation with Natural Neighbors for K Nearest Neighbor. Appl Intell 50, 3535–3553 (2020). https://doi.org/10.1007/s10489-020-01732-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-01732-1