Improving self-training with density peaks of data and cut edge weight statistic

Wei, Danni; Yang, Youlong; Qiu, Haiquan

doi:10.1007/s00500-020-04887-8

Improving self-training with density peaks of data and cut edge weight statistic

Methodologies and Application
Published: 04 April 2020

Volume 24, pages 15595–15610, (2020)
Cite this article

Soft Computing Aims and scope Submit manuscript

271 Accesses
8 Citations
Explore all metrics

Abstract

Semi-supervised classification has become an active topic recently, and a number of algorithms, such as self-training, have been proposed to improve the performance of supervised classification using unlabeled data. Considering the influence of spatial distribution of data set and mislabeled samples on the classification performance of self-training method, an improved self-training algorithm based on density peaks and cut edge weight statistic is proposed in this paper. Firstly, the representative unlabeled samples are selected for labels prediction by space structure, which is discovered by clustering method based on density peaks. Secondly, cut edge weight is used as statistics to make hypothesis testing for identifying whether samples are labeled correctly. Thirdly, the labeled data set is gradually enlarged with correctly labeled samples. The above steps are iterated until all unlabeled samples are labeled. The framework of improved self-training method not only makes full use of space structure information, but also solves the problem that some samples may be classified incorrectly. Thus, the classification accuracy of algorithm is improved in a great measure. Extensive experiments on benchmark data sets clearly illustrate the effectiveness of proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A semi-supervised self-training method based on density peaks and natural neighbors

Article 08 August 2020

Self-training with Neighborhood Information for the Classification of Remote Sensing Images

Adaptive active learning through k-nearest neighbor optimized local density clustering

Article 04 November 2022

References

Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. Mult Valued Log Soft Comput 17(2–3):255–287
Google Scholar
Asuncion A, Newman D (2007) UCI machine learning repository. Available at http://archive.ics.uci.edu/ml/datasets.php
Cao Y, He H, Huang H (2011) Lift: a new framework of learning from testing data for face recognition. Neurocomputing 74(6):916–926
Article Google Scholar
Chang C, Lin C (2011) LIBSVM: a library for support vector machines. ACM TIST 2(3):27:1–27:27
Google Scholar
Chen W, Shao Y, Hong N (2014) Laplacian smooth twin support vector machine for semi-supervised classification. Int J Mach Learn Cybern 5(3):459–468
Article Google Scholar
Derrac J, García S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput 1(1):3–18
Article Google Scholar
Di W, Xin L, Wang G, Shang M, Yan H (2017) A highly-accurate framework for self-labeled semi-supervised classification in industrial applications. IEEE Trans Ind Inform PP(99):1
Google Scholar
Domingos PM (2012) A few useful things to know about machine learning. Commun ACM 55(10):78–87
Article Google Scholar
Fan GF, Peng LL, Hong WC (2018) Short term load forecasting based on phase space reconstruction algorithm and bi-square kernel regression model. Appl Energy 224:13–33
Article Google Scholar
Gan H, Sang N, Huang R, Tong X, Dan Z (2013) Using clustering analysis to improve semi-supervised classification. Neurocomputing 101:290–298
Article Google Scholar
Jm I, Mitchell TM (2015) Machine learning: trends, perspectives, and prospects. Science 349(6245):255–260
Article MathSciNet MATH Google Scholar
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666
Article Google Scholar
Li Y, Guo M (2012) A new relational tri-training system with adaptive data editing for inductive logic programming. Knowl Based Syst 35(none):173–185
Article Google Scholar
Liu X, Pan S et al (2014) Graph-based semi-supervised learning by mixed label propagation with a soft constraint. Inf Sci 277:327–337
Article MathSciNet MATH Google Scholar
Manevitz LM, Yousef M (2002) One-class svms for document classification. J Mach Learn Res 2(1):139–154
MATH Google Scholar
Muhlenbach F, Lallich S, Zighed DA (2004) Identifying and handling mislabelled instances. J Intell Inf Syst Integr Artif Intell Database Technol 22(1):89–109
MATH Google Scholar
Narayanaswamy S, Paige B, van de Meent J, Desmaison A, Goodman ND, Kohli P, Wood FD, Torr PHS (2017) Learning disentangled representations with semi-supervised deep generative models. In: Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, 4–9 December 2017. Long Beach, pp 5925–5935
Nigam K, Mccallum AK, Thrun S, Mitchell T (2000) Text classification from labeled and unlabeled documents using em. Mach Learn 39(2–3):103–134
Article MATH Google Scholar
Pavlinek M, Podgorelec V (2017) Text classification method based on self-training and LDA topic models. Expert Syst Appl 80:83–93
Article Google Scholar
Rodriguez A, Laio A (2014a) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Article Google Scholar
Rodriguez A, Laio A (2014b) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Article Google Scholar
Sakai T, du Plessis MC, Niu G, Sugiyama M (2017) Semi-supervised classification based on classification from positive and unlabeled data. In: Proceedings of the 34th international conference on machine learning, ICML 2017, Sydney, NSW, Australia, 6–11 August 2017, pp 2998–3006
Su Y, Shan S, Chen X, Gao W (2009) Hierarchical ensemble of global and local classifiers for face recognition. IEEE Trans Image Process 18(8):1885–1896
Article MathSciNet MATH Google Scholar
Tanha J, van Someren M, Afsarmanesh H (2017) Semi-supervised self-training for decision tree classifiers. Int J Mach Learn Cybern 8(1):355–370
Article Google Scholar
Triguero I, Sáez JA et al (2014) On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification. Neurocomputing 132:30–41
Article Google Scholar
Wang J, Jebara T, Chang SF (2013) Semi-supervised learning using greedy max-cut. J Mach Learn Res 14(1):771–800
MathSciNet MATH Google Scholar
Wang XF, Xu Y (2017) Fast clustering using adaptive density peak detection. Stat Methods Med Res 26(6):2800–2811
Article MathSciNet Google Scholar
Wang Y, Li H, Yen GG, Song W (2015) MOMMOP: multiobjective optimization for locating multiple optimal solutions of multimodal optimization problems. IEEE Trans Cybern 45(4):830–843
Article Google Scholar
Wu D, Shang EA (2018) Self-training semi-supervised classification based on density peaks of data. Neurocomputing 275:180–191
Article Google Scholar
Wu D, Shang M, Wang G, Li L (2018) A self-training semi-supervised classification algorithm based on density peaks of data and differential evolution. In: 15th IEEE international conference on networking, sensing and control, ICNSC 2018, Zhuhai, China, March 27-29, 2018, pp 1–6
Yun J, Yong M, Li Z (2012) A modified self-training semi-supervised SVM algorithm. In: 2012 international conference on communication systems and network technologies. IEEE, pp 224–228
Zeng N, Wang Z, Zhang H, Liu W, Alsaadi FE (2016) Deep belief networks for quantitative analysis of a gold immunochromatographic strip. Cognit Comput 8(4):684–692
Article Google Scholar
Zhang P, He Z (2013) A weakly supervised approach to chinese sentiment classification using partitioned self-training. J Inf Sci 39(6):815–831
Article Google Scholar
Zhang Y, Wen J, Wang X, Jiang Z (2014) Semi-supervised learning combining co-training with active learning. Expert Syst Appl 41(5):2372–2378
Article Google Scholar
Zhang Z, Hong WC (2019) Electric load forecasting by complete ensemble empirical mode decomposition adaptive noise and support vector regression with quantum-based dragonfly algorithm. Nonlinear Dyn 98(2):1107–1136
Article Google Scholar
Zhou ZH, Li M (2010) Semi-supervised learning by disagreement. Knowl Inf Syst 24(3):415–439
Article MathSciNet Google Scholar
Zhu X, Goldberg AB (2009) Introduction to semi-supervised learning. Synth Lect Artif Intell Mach Learn 3(1):130
MATH Google Scholar
Zighed DA, Lallich S, Muhlenbach F (2002) Separability index in supervised learning. In: Proceedings of the 6th European conference on principles of data mining and knowledge discovery
Zou Y, Yu Z, Kumar BVKV, Wang J (2018) Unsupervised domain adaptation for semantic segmentation via class-balanced self-training. In: Computer vision—ECCV 2018—15th European conference, Munich, Germany, September 8–14, 2018, proceedings, part III, pp 297–313

Download references

Acknowledgements

This research was supported by National Natural Science Foundation of China (No. 61573266).

Author information

Authors and Affiliations

School of Mathematics and Statistics, Xidian University, Xi’an, 710071, People’s Republic of China
Danni Wei, Youlong Yang & Haiquan Qiu

Authors

Danni Wei
View author publications
You can also search for this author in PubMed Google Scholar
Youlong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Haiquan Qiu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Danni Wei or Youlong Yang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wei, D., Yang, Y. & Qiu, H. Improving self-training with density peaks of data and cut edge weight statistic. Soft Comput 24, 15595–15610 (2020). https://doi.org/10.1007/s00500-020-04887-8

Download citation

Published: 04 April 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s00500-020-04887-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving self-training with density peaks of data and cut edge weight statistic

Abstract

Access this article

Similar content being viewed by others

A semi-supervised self-training method based on density peaks and natural neighbors

Self-training with Neighborhood Information for the Classification of Remote Sensing Images

Adaptive active learning through k-nearest neighbor optimized local density clustering

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving self-training with density peaks of data and cut edge weight statistic

Abstract

Access this article

Similar content being viewed by others

A semi-supervised self-training method based on density peaks and natural neighbors

Self-training with Neighborhood Information for the Classification of Remote Sensing Images

Adaptive active learning through k-nearest neighbor optimized local density clustering

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation