Skip to main content
Log in

Improving self-training with density peaks of data and cut edge weight statistic

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Semi-supervised classification has become an active topic recently, and a number of algorithms, such as self-training, have been proposed to improve the performance of supervised classification using unlabeled data. Considering the influence of spatial distribution of data set and mislabeled samples on the classification performance of self-training method, an improved self-training algorithm based on density peaks and cut edge weight statistic is proposed in this paper. Firstly, the representative unlabeled samples are selected for labels prediction by space structure, which is discovered by clustering method based on density peaks. Secondly, cut edge weight is used as statistics to make hypothesis testing for identifying whether samples are labeled correctly. Thirdly, the labeled data set is gradually enlarged with correctly labeled samples. The above steps are iterated until all unlabeled samples are labeled. The framework of improved self-training method not only makes full use of space structure information, but also solves the problem that some samples may be classified incorrectly. Thus, the classification accuracy of algorithm is improved in a great measure. Extensive experiments on benchmark data sets clearly illustrate the effectiveness of proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. Mult Valued Log Soft Comput 17(2–3):255–287

    Google Scholar 

  • Asuncion A, Newman D (2007) UCI machine learning repository. Available at http://archive.ics.uci.edu/ml/datasets.php

  • Cao Y, He H, Huang H (2011) Lift: a new framework of learning from testing data for face recognition. Neurocomputing 74(6):916–926

    Article  Google Scholar 

  • Chang C, Lin C (2011) LIBSVM: a library for support vector machines. ACM TIST 2(3):27:1–27:27

    Google Scholar 

  • Chen W, Shao Y, Hong N (2014) Laplacian smooth twin support vector machine for semi-supervised classification. Int J Mach Learn Cybern 5(3):459–468

    Article  Google Scholar 

  • Derrac J, García S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput 1(1):3–18

    Article  Google Scholar 

  • Di W, Xin L, Wang G, Shang M, Yan H (2017) A highly-accurate framework for self-labeled semi-supervised classification in industrial applications. IEEE Trans Ind Inform PP(99):1

    Google Scholar 

  • Domingos PM (2012) A few useful things to know about machine learning. Commun ACM 55(10):78–87

    Article  Google Scholar 

  • Fan GF, Peng LL, Hong WC (2018) Short term load forecasting based on phase space reconstruction algorithm and bi-square kernel regression model. Appl Energy 224:13–33

    Article  Google Scholar 

  • Gan H, Sang N, Huang R, Tong X, Dan Z (2013) Using clustering analysis to improve semi-supervised classification. Neurocomputing 101:290–298

    Article  Google Scholar 

  • Jm I, Mitchell TM (2015) Machine learning: trends, perspectives, and prospects. Science 349(6245):255–260

    Article  MathSciNet  MATH  Google Scholar 

  • Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666

    Article  Google Scholar 

  • Li Y, Guo M (2012) A new relational tri-training system with adaptive data editing for inductive logic programming. Knowl Based Syst 35(none):173–185

    Article  Google Scholar 

  • Liu X, Pan S et al (2014) Graph-based semi-supervised learning by mixed label propagation with a soft constraint. Inf Sci 277:327–337

    Article  MathSciNet  MATH  Google Scholar 

  • Manevitz LM, Yousef M (2002) One-class svms for document classification. J Mach Learn Res 2(1):139–154

    MATH  Google Scholar 

  • Muhlenbach F, Lallich S, Zighed DA (2004) Identifying and handling mislabelled instances. J Intell Inf Syst Integr Artif Intell Database Technol 22(1):89–109

    MATH  Google Scholar 

  • Narayanaswamy S, Paige B, van de Meent J, Desmaison A, Goodman ND, Kohli P, Wood FD, Torr PHS (2017) Learning disentangled representations with semi-supervised deep generative models. In: Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, 4–9 December 2017. Long Beach, pp 5925–5935

  • Nigam K, Mccallum AK, Thrun S, Mitchell T (2000) Text classification from labeled and unlabeled documents using em. Mach Learn 39(2–3):103–134

    Article  MATH  Google Scholar 

  • Pavlinek M, Podgorelec V (2017) Text classification method based on self-training and LDA topic models. Expert Syst Appl 80:83–93

    Article  Google Scholar 

  • Rodriguez A, Laio A (2014a) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496

    Article  Google Scholar 

  • Rodriguez A, Laio A (2014b) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496

    Article  Google Scholar 

  • Sakai T, du Plessis MC, Niu G, Sugiyama M (2017) Semi-supervised classification based on classification from positive and unlabeled data. In: Proceedings of the 34th international conference on machine learning, ICML 2017, Sydney, NSW, Australia, 6–11 August 2017, pp 2998–3006

  • Su Y, Shan S, Chen X, Gao W (2009) Hierarchical ensemble of global and local classifiers for face recognition. IEEE Trans Image Process 18(8):1885–1896

    Article  MathSciNet  MATH  Google Scholar 

  • Tanha J, van Someren M, Afsarmanesh H (2017) Semi-supervised self-training for decision tree classifiers. Int J Mach Learn Cybern 8(1):355–370

    Article  Google Scholar 

  • Triguero I, Sáez JA et al (2014) On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification. Neurocomputing 132:30–41

    Article  Google Scholar 

  • Wang J, Jebara T, Chang SF (2013) Semi-supervised learning using greedy max-cut. J Mach Learn Res 14(1):771–800

    MathSciNet  MATH  Google Scholar 

  • Wang XF, Xu Y (2017) Fast clustering using adaptive density peak detection. Stat Methods Med Res 26(6):2800–2811

    Article  MathSciNet  Google Scholar 

  • Wang Y, Li H, Yen GG, Song W (2015) MOMMOP: multiobjective optimization for locating multiple optimal solutions of multimodal optimization problems. IEEE Trans Cybern 45(4):830–843

    Article  Google Scholar 

  • Wu D, Shang EA (2018) Self-training semi-supervised classification based on density peaks of data. Neurocomputing 275:180–191

    Article  Google Scholar 

  • Wu D, Shang M, Wang G, Li L (2018) A self-training semi-supervised classification algorithm based on density peaks of data and differential evolution. In: 15th IEEE international conference on networking, sensing and control, ICNSC 2018, Zhuhai, China, March 27-29, 2018, pp 1–6

  • Yun J, Yong M, Li Z (2012) A modified self-training semi-supervised SVM algorithm. In: 2012 international conference on communication systems and network technologies. IEEE, pp 224–228

  • Zeng N, Wang Z, Zhang H, Liu W, Alsaadi FE (2016) Deep belief networks for quantitative analysis of a gold immunochromatographic strip. Cognit Comput 8(4):684–692

    Article  Google Scholar 

  • Zhang P, He Z (2013) A weakly supervised approach to chinese sentiment classification using partitioned self-training. J Inf Sci 39(6):815–831

    Article  Google Scholar 

  • Zhang Y, Wen J, Wang X, Jiang Z (2014) Semi-supervised learning combining co-training with active learning. Expert Syst Appl 41(5):2372–2378

    Article  Google Scholar 

  • Zhang Z, Hong WC (2019) Electric load forecasting by complete ensemble empirical mode decomposition adaptive noise and support vector regression with quantum-based dragonfly algorithm. Nonlinear Dyn 98(2):1107–1136

    Article  Google Scholar 

  • Zhou ZH, Li M (2010) Semi-supervised learning by disagreement. Knowl Inf Syst 24(3):415–439

    Article  MathSciNet  Google Scholar 

  • Zhu X, Goldberg AB (2009) Introduction to semi-supervised learning. Synth Lect Artif Intell Mach Learn 3(1):130

    MATH  Google Scholar 

  • Zighed DA, Lallich S, Muhlenbach F (2002) Separability index in supervised learning. In: Proceedings of the 6th European conference on principles of data mining and knowledge discovery

  • Zou Y, Yu Z, Kumar BVKV, Wang J (2018) Unsupervised domain adaptation for semantic segmentation via class-balanced self-training. In: Computer vision—ECCV 2018—15th European conference, Munich, Germany, September 8–14, 2018, proceedings, part III, pp 297–313

Download references

Acknowledgements

This research was supported by National Natural Science Foundation of China (No. 61573266).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Danni Wei or Youlong Yang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, D., Yang, Y. & Qiu, H. Improving self-training with density peaks of data and cut edge weight statistic. Soft Comput 24, 15595–15610 (2020). https://doi.org/10.1007/s00500-020-04887-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-020-04887-8

Keywords

Navigation