Abstract
Self-training algorithm is a well-known framework of semi-supervised learning. How to select high-confidence samples is the key step for self-training algorithm. If high-confidence examples with incorrect labels are employed to train the classifier, the error will get worse during iterations. To improve the quality of high-confidence samples, a novel data editing technique termed Relative Node Graph Editing (RNGE) is put forward. Say concretely, mass estimation is used to calculate the density and peak of each sample to build a prototype tree to reveal the underlying spatial structure of the data. Then, we define the Relative Node Graph (RNG) for each sample. Finally, the mislabeled samples in the candidate high-confidence sample set are identified by hypothesis test based on RNG. Combined above, we propose a Robust Self-training Algorithm based on Relative Node Graph (STRNG), which uses RNGE to identify mislabeled samples and edit them. The experimental results show that the proposed algorithm can improve the performance of the self-training algorithm.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The data sets, and experimental results generated during analyzed and the current study are available from https://github.com/511lab/STRNG.
Notes
http://archive.ics.uci.edu/ml/index.php.
http://archive.ics.uci.edu/ml/index.php.
http://archive.ics.uci.edu/ml/index.php.
http://www.uk.research.att.com/facedatabase.html.
References
Asencio-Cortés G, Martínez-Álvarez F, Morales-Esteban A, Reyes J (2016) A sensitivity study of seismicity indicators in supervised learning to improve earthquake prediction. Knowl Based Syst 101:15–30
Li J, Zhu Q (2020) A boosting self-training framework based on instance generation with natural neighbors for K nearest neighbor. Appl Intell 50:3535–3553
Wang LM, Zhang XH, Li K, Zhang S (2022) Semi-supervised learning for k-dependence bayesian classifiers. Appl Intell 52:3604–3622
Pei H, Wang K, Lin Q, Zhong P (2018) Robust semi-supervised extreme learning machine. Knowl Based Syst 159:203–220
Liu Z, Lai Z, Weihua O, Zhang K, Huo H (2023) Discriminative sparse least square regression for semi-supervised learning. Inf Sci 636:118903
Levatic J, Ceci M, Kocev D, Saso D (2017) Self-training for multi-target regression with tree ensembles. Knowl Based Syst 123:41–60
Zhou P, Wang N, Zhao S, Zhang Y (2023) Robust semi-supervised clustering via data transductive warping. Appl Intell 53:1254–1270
Ienco D, Interdonato R (2023) Deep semi-supervised clustering for multi-variate time-series. Neurocomputing 516:2023
Gan H, Fan Y, Luo Z, Huang R, Yang Z (2019) Confidence-weighted safe semi-supervised clustering. Eng Appl Artif Intell 81:107–116
Schwenker F, Trentin E (2014) Pattern classification and clustering: A review of partially supervised learning approaches. Pattern Recognit Lett 37:4–14
Sichao F, Wang S, Liu W, Liu B, Zhou B, You X, Peng Q, Jing XY (2022) Adaptive graph con-volutional collaboration networks for semi-supervised classification. Inf Sci 611:262–276
Chen X, Guoxian Y, Tan Q, Wang J (2019) Weighted samples based semi-supervised classification. Appl Soft Comput 79:46–58
Zou Y, Zhiding Y, Liu X, Kumar BVK, Vijaya, Wang J (2019) Confidence Regularized Self-Training. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 5981–5990
Nigam K, Ghani R (2000) Analyzing the effectiveness and applicability of co-training
Li J (2022) Nang-st: A natural neighborhood graph-based self-training method for semi-supervised classification. Neurocomputing 514:268–284
Hao D, Ahsan M, Salim T, Duarte Rojo A, Esmaeel D, Zhang Y, Arefan D, Shandong W (2022) A self-training teacher-student model with an automatic label grader for abdominal skeletal muscle segmentation. Artif Intell Med 132:102366
Ren Y, Zhu H, Tian Y, Jinglu H (2021) A laplacian svm based semi-supervised classification using multi-local linear model. IEE Trans Elect Electr Eng 16(3):455–463
Li J, Zhu Q, Quanwang W, Cheng D (2020) An effective framework based on local cores for self-labeled semi-supervised classification. Knowl Based Syst 197:105804
Banerjee B, Bovolo F, Bhattacharya A, Bruzzone L, Chaudhuri S, Krishna M (2015) A new self-training-based unsupervised satellite image classification technique using cluster ensemble strategy. IEEE Geosci Rem Sens Lett 12(4:741–745
Xia CQ, Han K, Qi Y, Zhang Y, DongJun Y (2018) A self-training subspace clustering algorithm under low-rank representation for cancer classification on gene expression data. IEEE/ACM Trans Comput Biol Bioinformatics 15(4):1315–1324
Li M, Zhou ZH (2005) SETRED: self-training with editing. Pacific-Asia Conference on Knowledge Discovery and Data Mining
Muhlenbach Fabrice, Lallich Stéphane, Zighed Djamel A (2024) Identifying and handling mislabelled instances. J Intell Inf Syst 22(1):89–109
Supowit KJ (1983) The relative neighborhood graph, with an application to minimumspanning trees. J the ACM 30(3:428–448
Wang Y, Xiaoyuan X, Zhao H, Hua Z (2010) Semi-supervised learning based on nearest neighbor rule and cut edges. Knowl Based Syst 23(6):547–554
Banerjee B, Bovolo F, Bhattacharya A, Bruzzone L, Chaudhuri S, Krishna MB (2015) A new self-training-based unsupervised satellite image classification technique using cluster ensemble strategy. IEEE Geosci Remote Sens Lett 12(4):741–745
Wei Z, Wang H, Zhao R (2013) Semi-supervised multi-label image classification based on nearest neighbor editing. Neurocomputing 119:462–468
lark SP, Wagner TJ (1977) Another look at the edited nearest neighbor rule. IEEE Trans Syst Man, and Cybern 7:92–94
Gan H, Sang N, Huang R, Tong X, Dan Z (2013) Using clustering analysis to improve semi-supervised classification. Neurocomputing 101:290–298
Yin X, Shu T, Huang Q (2012) Semi-supervised fuzzy clustering with metric learning and entropy regularization. Knowl Based Syst 35:304–311
Adankon MM, Cheriet M (2011) Help-training for semi-supervised support vector machines. Pattern Recognit 44(9):2220–2230
DianHua W, Shang M, Luo X, Ji X, Yan H (2018) Self-training semi-supervised classification based on density peaks of data. Neurocomputing 275:180–191
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Sci 344:1492–1496
Wu D, Shang M, Wang G, Li L (2018) A self-training semi-supervised classification algorithm based on density peaks of data and differential evolution. In 2018 IEEE 15th international conference on networking, sensing and control. pp 1-6
Li J, Zhu Q, QuanWang W (2019) A self-training method based on density peaks and an extended parameter-free local noise filter for k nearest neighbor. Knowl Based Syst 184:104895
Liu Y (2020) Self-training algorithm combining density peak and cut edge weight. J Vis Lang Comput 1:11–16
Amorim WP, Falcão AX, Papa JP (2018) Multi-label semi-supervised classification through optimum-path forest. Inf ences. 465:86–104
Li J, Zhu Q (2019) Semi-supervised self-training method based on an optimum-path forest. IEEE Access, pp 36388–36399
Zhao S, Li J (2021) A semi-supervised self-training method based on density peaks and natural neighbors. J Ambient Intell Humanized Comput 12(2):2939–2953
Huang C, Li M, FeiLongCao HF, Li Z, XinDong W (2023) Are Graph Convolutional Networks With Random Weights Feasible?. IEEE Trans Pattern Anal Machine Intell 45(3):2751–2768
Zou C, Han A, Lin L, Li M, Gao J (2023) A simple yet effective framelet-based graph neural network for directed graphs. IEEE Transactions on Artificial Intelligence. pp 1–11
Li J, Zheng R, Feng H, Li M, Zhuang X (2024). Permutaion Equivariant Graph Framelets for Heterophilous Semi-supervised Learning. IEEE Transactions on Neural Networks and Learning Systems, pp 1–15
Li M, Zhang L, LiXin CL, Bai ZL, XinDong W (2023) BLoG: Bootstrapped graph representation learning with local and global regularization for recommendation. Pattern Recognit 144:109874
Li M, Zhuang X, Bai L, Ding W (2024) Multimodal graph learning based on 3D Haar semi-tight framelet for student engagement prediction. Inf Fusion 105:102224
Li B, Wang J, Yang Z, Yi J, Nie F (2023) Fast semi-supervised self-training algorithm based on data editing. Inf Sci 626:293–314
Xia S, Peng D, Meng D, Zhang C, Wang G, Giem E, Wei W, Chen Z (2020)A fast adaptive k-means with no bounds. IEEE Transactions on Pattern Analysis and Machine Intelligence
Ting K, Zhou G, Liu FT, Tan SC (2013) Mass estimation. Machine Learn 90(1):127–160
Ding S, Xiao X, Wang Y (2020) Optimized density peaks clustering algorithm based on dissimilarity measure. J Soft 31(11):3321–3333
Chen B, Ting KM, Washio T, Haffari G (2015) Half-space mass: a maximally robust and efficient data depth method. Mach Learn 100(2–3):67–699
Liu FT, Ting KM, Zhou Z-H (2008) Isolation forest. In: 2008 Eighth IEEE international conference on data mining, pp 413–422
Hasan MS, Wu X, Watson LT, Zhang L (2017) UPS-indel: a universal positioning system for indels. Sci Rep 7(1):1–13
Acknowledgements
The work was partially supported Gansu University Innovation Fund Project (2023B-94), the National Social Science Fund of China (Grant No. 20XTJ005), the Central Government Funds for Guiding Local Science and Technology Development of China (Grant No. YDZX20216200001876).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, J., Duan, H., Zhang, C. et al. A robust self-training algorithm based on relative node graph. Appl Intell 55, 1 (2025). https://doi.org/10.1007/s10489-024-06062-0
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-06062-0