A boosting Self-Training Framework based on Instance Generation with Natural Neighbors for K Nearest Neighbor

Li, Junnan; Zhu, Qingsheng

doi:10.1007/s10489-020-01732-1

A boosting Self-Training Framework based on Instance Generation with Natural Neighbors for K Nearest Neighbor

Published: 15 June 2020

Volume 50, pages 3535–3553, (2020)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

769 Accesses
16 Citations
Explore all metrics

Abstract

The semi-supervised self-training method is one of the successful methodologies of semi-supervised classification. The mislabeling is the most challenging issue in self-training methods and the ensemble learning is one of the common techniques for dealing with the mislabeling. Specifically, the ensemble learning can solve or alleviate the mislabeling by constructing an ensemble classifier to improve prediction accuracy in the self-training process. However, most ensemble learning methods may not perform well in self-training methods because it is difficult for ensemble learning methods to train an effective ensemble classifier with a small number of labeled data. Inspired by the successful boosting methods, we introduce a new boosting self-training framework based on instance generation with natural neighbors (BoostSTIG) in this paper. BoostSTIG is compatible with most boosting methods and self-training methods. It can use most boosting methods to solve or alleviate the mislabeling of existing self-training methods by improving the prediction accuracy in the self-training process. Besides, an instance generation with natural neighbors is proposed to enlarge initial labeled data in BoostSTIG, which makes boosting methods more suitable for self-training methods. In experiments, we apply the BoostSTIG framework to 2 self-training methods and 4 boosting methods, and then validate BoostSTIG by comparing some state-of-the-art technologies on real data sets. Intensive experiments show that BoostSTIG can improve the performance of tested self-training methods and train an effective k nearest neighbor.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A semi-supervised self-training method based on density peaks and natural neighbors

Article 08 August 2020

Hybrid local boosting utilizing unlabeled data in classification tasks

Article 07 October 2017

A synthetic neighborhood generation based ensemble learning for the imbalanced data classification

Article 04 December 2017

Notes

http://archive.ics.uci.edu/ml/datasets.html.

References

Happy SL, Dantcheva A, Bremond F (2019) A Weakly Supervised learning technique for classifying facial expressions. Pattern Recognition Letters 128(1):162–168
Google Scholar
Song Y, Upadhyay S, Peng H, Mayhew S, Roth D (2019) Toward any-language zero-shot topic classification of textual documents. Artif Intell 274:33–150
MathSciNet MATH Google Scholar
Ahmed Ghoneim, Ghulam Muhammad, M. Shamim Hossain, Cervical cancer classification using convolutional neural networks and extreme learning machines, Future Generation Computer Systems 102 (2020) 643–649
Abayomi-Alli O, Misra S, Abayomi-Alli A, Odusami M (2019) A review of soft techniques for SMS spam classification: methods, approaches and applications. Eng Appl Artif Intell 86:197–212
Google Scholar
Adcock CJ, Meade N (2017) Using parametric classification trees for model selection with applications to financial risk management. European Journal of Operational Research 259(2):746–765
MathSciNet MATH Google Scholar
Liu C, Wang J, Duan S, Xu Y (2019) Combining dissimilarity measures for image classification. Pattern Recogn Lett 128(1):536–543
Google Scholar
Chen X, Yu G, Tan Q, Wang J (2019) Weighted samples based semi-supervised classification. Appl Soft Comput 79:46–58
Google Scholar
Xie Y, Zhang J, Xia Y (2019) Semi-supervised adversarial model for benign-malignant lung nodule classification on chest CT. Med Image Anal 57:237–248
Google Scholar
Rossi RG, de Andrade Lopes A, Rezende SO (2016) Optimization and label propagation in bipartite heterogeneous networks to improve transductive classification of texts. Information Processing & Management 52(2):217–257
Google Scholar
Zhang Z, Jia L, Zhao M, Ye Q, Zhang M, Wang M (2018) Adaptive non-negative projective semi-supervised learning for inductive classification. Neural Netw 108:128–145
MATH Google Scholar
Li Q, Liu W, Li L (2019) Self-reinforced diffusion for graph-based semi-supervised learning. Pattern Recogn Lett 125(1):439–445
Google Scholar
Sheikhpour R, Sarram MA, Sheikhpour E (2018) Semi-supervised sparse feature selection via graph Laplacian based scatter matrix for regression problems. Information Sciences 468:14–28
MATH Google Scholar
Zhan Y, Bai Y, Zhang W, Ying S (2018) A P-ADMM for sparse quadratic kernel-free least squares semi-supervised support vector machine. Neurocomputing 306(6):37–50
Google Scholar
Hu T, Huang X, Li J, Zhang L (2018) A novel co-training approach for urban land cover mapping with unclear Landsat time series imagery. Remote Sens Environ 217:144–157
Google Scholar
Liu B, Feng J, Liu M, Hu H, Wang X (2015) Predicting the quality of user-generated answers using co-training in community-based question answering portals. Pattern Recogn Lett 58(1):29–34
Google Scholar
Tanha J, Van Someren M, Afsarmanesh H (2017) Semi-supervised self-training for decision tree classifiers. Int J Mach Learn Cybern 8(1):355–370
Google Scholar
Karliane M. O. Vale, Anne Magály P. Canuto, Araken Medeiros Santos, Flavius L. Gorgônio, Alan de M. Tavares, Arthur Gorgnio, Cainan Alves, Automatic Adjustment of Confidence Values in Self-training Semi-supervised Method, 2018 International Joint Conference on Neural Networks (IJCNN), 2018, pp. 1–8
Wu D, Shang MS, Luo X, Xu J, Yan HY, Deng WH, Wang GY (2018) Self-training semi-supervised classification based on density peaks of data. Neurocomputing 275(31):180–191
Google Scholar
Hajmohammadi MS, Ibrahim R (2015) Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples. Inf Sci 317(1):67–77
Google Scholar
Shi L, Ma X, Xi L, Duan Q, Zhao J (2011) Rough set and ensemble learning based semi-supervised algorithm for text classification. Expert Syst Appl 38(5):6300–6306
Google Scholar
Vo DT, Bagheri E (2017) Self-training on refined clause patterns for relation extraction. Inf Process Manag 54(4):686–706
Google Scholar
Dalva D, Guz U, Gurkan H (2018) Effective semi-supervised learning strategies for automatic sentence segmentation. Pattern Recogn Lett 105(1):76–86
Google Scholar
Le THN, Luu K, Zhu C, Savvides M (2017) Semi self-training beard/moustache detection and segmentation simultaneously. Image & Vision Computing 58:214–223
Google Scholar
Xia CQ, Han K, Qi Y, Zhang Y, Yu DJ (2018) A self-training subspace clustering algorithm under low-rank representation for cancer classification on gene expression data. IEEE/ACM Transactions on Computational Biology and Bioinformatics 15(4):1315–1324
Google Scholar
Li M, Zhou ZH (2005) SETRED: Self-training with editing, Pacific-asia Conference on Advances in Knowledge Discovery & Data Mining, pp. 611–621
Wang Y, Xu X, Zhao H, Hua Z (2010) Semi-supervised learning based on nearest neighbor rule and cut edges. Knowl-Based Syst 23(6):547–554
Google Scholar
Adankon MM, Cheriet M (2011) Help-training for semi-supervised support vector machines. Pattern Recogn 44(9):2220–2230
Google Scholar
Wei Z, Wang H, Zhao R (2013) Semi-supervised multi-label image classification based on nearest neighbor editing. Neurocomputing 119(7):462–468
Google Scholar
Gan H, Sang N, Huang R, Tong X, Dan Z (2013) Using clustering analysis to improve semi-supervised classification. Neurocomputing 101(4):290–298
Google Scholar
Triguero I, Sáez AJ, Luengo J, García S, Herrera F (2014) On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification. Neurocomputing 132(20):30–41
Google Scholar
Levatić J, Ceci M, Kocev D, Džeroski S (2017) Self-training for multi-target regression with tree ensembles. Knowl-Based Syst 123(1):41–60
Google Scholar
Wu D, Shang MS, Wang GY, Li L (2018) A Self-Training Semi-Supervised Classification Algorithm Based on Density Peaks of Data and Differential Evolution, 2018 IEEE 15th international conference on networking, Sensing and Control (ICNSC), pp 1–6
Li J, Zhu Q (2019) Semi-supervised self-training method based on an optimum-path forest. IEEE Access 7:36388–36399
Google Scholar
Li J, Zhu Q, Wu Q (2019) A self-training method based on density peaks and an extended parameter-free local noise filter for k nearest neighbor, Knowledge-Based Systems 31
Ribeiro FDS, Calivá F, Swainson M, Gudmundsson K, Leontidis G, Kollias S (2019) Deep Bayesian self-training. Neural Comput & Applic 3:1–17
Google Scholar
Liu J, Zhao S, Wang G (2018) SSEL-ADE: a semi-supervised ensemble learning framework for extracting adverse drug events from social media. Artif Intell Med 84:34–49
Google Scholar
Freund Y, Schapire R (1996) Experiments with a new boosting algorithm, in: Proc. of the Thirteenth International Conference on Machine Learning, pp. 148–156
García-Pedrajas N, de Haro-García A (2014) Boosting instance selection algorithms. Knowl-Based Syst 67:342–360
Google Scholar
Li Y, Qi L, Tan S (2016) Improved semi-supervised online boosting for object tracking, International Symposium on Optoelectronic Technology and Application 2016
Fazakis N,Kostopoulos G, Karlos S, Kotsiantis S, Sgarbas K (2019) Self-trained extreme gradient boosting trees, 2019 10th international conference on information, Intelligence, Systems and Applications (IISA)
Triguero I, Garcia S, Herrera F (2015) Seg-ssc: a framework based on synthetic examples generation for self-labeled semi-supervised classification. IEEE Transactions on Cybernetics 45(4):622–634
Google Scholar
Zhu Q, Feng J, Huang J (2016) Natural neighbor: a self-adaptive neighborhood method without parameter k. Pattern Recogn Lett 80(1):30–36
Google Scholar
Zhang Y, Sakhanenko L (2019) The naive Bayes classifier for functional data. Statistics & Probability Letters 152:137–146
MathSciNet MATH Google Scholar
Yin X, Shu T, Huang Q (2012) Semi-supervised fuzzy clustering with metric learning and entropy regularization. Knowl-Based Syst 35:304–311
Google Scholar
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344:1492–1496
Google Scholar
Xu S, Zhang C, Zhang J (2020) Bayesian deep matrix factorization network for multiple images denoising, Neural Networks (123) 420–428
Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421
MathSciNet MATH Google Scholar
Breiman L (2001) Random forests, Machine Learning (45) 5–32
Grabner H (2006) On-line boosting and vision. IEEE Computer Society Conference on Computer Vision & Pattern Recognition, pp. 260–267
Chakraborty D, Elzarka H (2019) Early detection of faults in HVAC systems using an XGBoost model with a dynamic threshold. Energy and BuildingsVolume 185(15):326–344
Google Scholar
Macedo M, Apolinário A (2018) Improved anti-aliasing for Euclidean distance transform shadow mapping, Computers & GraphicsVolume (71) 166–179
Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
MATH Google Scholar
Benetis R, Jensen CS, Karciauskas G, Saltenis S (2002) Nearest neighbor and reverse nearest neighbor queries for moving objects. Proceedings International Database Engineering and Applications Symposium 15(3):229–249
Google Scholar
Cheng D, Zhu Q, Huang J, Yang L, Wu Q (2017) Natural neighbor-based clustering algorithm with local representatives. Knowl-Based Syst 123(1):238–253
Google Scholar
Cheng D, Zhu Q, Huang J, Wu Q, Yang L (2018) A local cores-based hierarchical clustering algorithm for data sets with complex structures. Neural Comput & Applic 5:1–18
Google Scholar
Huang J, Zhu Q, Yang L, Feng J (2016) A non-parameter outlier detection algorithm based on natural neighbor. Knowl-Based Syst 92(15):71–77
Google Scholar
Yang L, Zhu Q, Huang J, Cheng D (2017) Adaptive edited natural neighbor algorithm. Neurocomputing 230(22):427–433
Google Scholar
Yang L, Zhu Q, Huang J, Cheng D, Wu Q, Hong X (2018) Natural neighborhood graph-based instance reduction algorithm without parameters. Appl Soft Comput 70:279–287
Google Scholar
Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517
MathSciNet MATH Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357
MATH Google Scholar
Storn RM, Price K (1995) Differential evolution: a simple and efficient adaptive scheme for global optimization over continuous spaces. JGlobal Optim 23(1):341–359
MATH Google Scholar
Friedman JH, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting. Ann Stat 28(2):337–407
MathSciNet MATH Google Scholar
C. Domingo, O. Watanabe (2000) MadaBoost: A Modification of AdaBoost, Proceeding COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, pp. 180–189
Webb GI (2000) Multiboosting: a technique for combining boosting and wagging. Mach Learn 40(2):159–196
Google Scholar
Rodríguez JJ, Maudes J (2008) Boosting recombined weak classifiers. Pattern Recogn Lett 29:1049–1059
Google Scholar

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (61272194 and 61502060) and the Project of Chongqing Natural Science Foundation (cstc2019jcyj-msxmX0683).

Author information

Authors and Affiliations

Chongqing Key Laboratory of Software Theory & Technology, College of Computer Science, Chongqing, China
Junnan Li & Qingsheng Zhu

Authors

Junnan Li
View author publications
You can also search for this author in PubMed Google Scholar
Qingsheng Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qingsheng Zhu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Zhu, Q. A boosting Self-Training Framework based on Instance Generation with Natural Neighbors for K Nearest Neighbor. Appl Intell 50, 3535–3553 (2020). https://doi.org/10.1007/s10489-020-01732-1

Download citation

Published: 15 June 2020
Issue Date: November 2020
DOI: https://doi.org/10.1007/s10489-020-01732-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A boosting Self-Training Framework based on Instance Generation with Natural Neighbors for K Nearest Neighbor

Abstract

Access this article

Similar content being viewed by others

A semi-supervised self-training method based on density peaks and natural neighbors

Hybrid local boosting utilizing unlabeled data in classification tasks

A synthetic neighborhood generation based ensemble learning for the imbalanced data classification

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A boosting Self-Training Framework based on Instance Generation with Natural Neighbors for K Nearest Neighbor

Abstract

Access this article

Similar content being viewed by others

A semi-supervised self-training method based on density peaks and natural neighbors

Hybrid local boosting utilizing unlabeled data in classification tasks

A synthetic neighborhood generation based ensemble learning for the imbalanced data classification

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation