Skip to main content

An Efficient Drift Detection Module for Semi-supervised Data Classification in Non-stationary Environments

  • Conference paper
  • First Online:
Intelligent Systems (BRACIS 2022)

Abstract

In the data stream (DS) context, data is received at high speed, and it must be processed as soon as possible. Furthermore, it is not possible to guarantee that all data is labelled. Consequently, semi-supervised learning (SSL) becomes an efficient attempt to build an effective model in this context. Dynamic Data Stream Learning (DyDaSL) is a framework that uses a SSL algorithm to build a model able to classify instances in a data stream context. In this paper, an extension of the DyDaSL drift detection module is proposed. Its main aim is to make drift detection more flexible and, in turn, to improve the whole data stream process. An empirical analysis is conducted using real and synthetic datasets. The proposed approach achieved better results than the original one and some state-of-art drift detection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chapelle, O., SchÖlkopf, B., Zien, A.: Semi-Supervised Learning. The MIT Press, Cambridge (2006)

    Google Scholar 

  2. Gollapudi, S.: Pratical Machine Learning. Packt Publishing Ltd., Livery Place (2016)

    Google Scholar 

  3. Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 44:1–44:37 (2014)

    Google Scholar 

  4. Gorgônio, A.C., de P. Canuto, A.M., Vale, K.M.O., Gorgônio, F.L.: A semi-supervised based framework for data stream classification in non-stationary environments. In: International Joint Conference on Neural Networks (2020)

    Google Scholar 

  5. Vale, K.M.O., et al.: Automatic adjustment of confidence values in self-training semi-supervised method. In: International Joint Conference on Neural Networks (2018)

    Google Scholar 

  6. Zhou, Z.-H.: Ensemble Methods, 1st edn. Chapman & Hall/CRC, New York (2012)

    Book  Google Scholar 

  7. Gharroudi, O.: Ensemble Multi-label Learning in Supervised and Semi-supervised Settings. Université de Lyon, Theses (2017)

    Google Scholar 

  8. Kuncheva, L.I., Rodriguez, J.J.: Classifier ensembles with a random linear oracle. IEEE Trans. Knowl. Data Eng. 19(4), 500–508 (2007)

    Google Scholar 

  9. Khezri, S., Tanha, J., Ahmadi, A., Sharifi, A.: STDS: self-training data streams for mining limited labeled data in non-stationary environment. Appl. Intell. 50, 1448–1467 (2020)

    Article  Google Scholar 

  10. Bi, X., Zhang, C., Zhao, X., Li, D., Sun, Y., Ma, Y.: Codes: efficient incremental semi-supervised classification over drifting and evolving social streams. IEEE Access 8, 14024–14035 (2020)

    Article  Google Scholar 

  11. Zhang, S., Jung Huang, D.T., Dobbie, G., Koh, Y.S.: Sled: semi-supervised locally-weighted ensemble detector. In 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 1838–1841. IEEE, Dallas, Texas (2020)

    Google Scholar 

  12. Tanha, J., Samadi, N., Abdi, Y., Razzaghi-Asl, N.: Cpssds: conformal prediction for semi-supervised classification on data streams. Inf. Sci. 584, 212–234 (2022)

    Article  Google Scholar 

  13. Sebastião, R., Fernandes, J.M.: Supporting the page-hinkley test with empirical mode decomposition for change detection. In: Kryszkiewicz, M., Appice, A., Ślȩzak, D., Rybinski, H., Skowron, A., Raś, Z.W. (eds.) Foundations of Intelligent Systems, LNAI, vol. 10352, pp. 492–498. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60438-1_48

  14. Page, E.S.: Continuous inspection schemes. Biometrika 41(1/2), 100–115 (1954)

    Article  MathSciNet  MATH  Google Scholar 

  15. Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)

    Google Scholar 

  16. Dimitoglou, G., Adams, J.A., Jim, C.M.: Comparison of the c4.5 and a naive bayes classifier for the prediction of lung cancer survivability. J. Comput. 4(8) (2012)

    Google Scholar 

  17. Breiman, L., Friedman, J., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. CHAPMAN & HALL/CRC, New York (1984)

    MATH  Google Scholar 

  18. Cohen, W.W.: Fast effective rule induction. In: Machine Learning, ML95, pp. 115–123. Morgan Kaufmann Publishers, Tahoe City, California, USA, (1995)

    Google Scholar 

  19. Atkeson, C.G., Moore, A.W., Schaau, S.: Locally weighted learning. Artific. Intell. Rev. 11(1), 11–73 (1997)

    Article  Google Scholar 

  20. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explor. Newsletter 11(1):10–18 (2009)

    Google Scholar 

  21. Olson, R.S., La Cava, W., Orzechowski, P., Urbanowicz, R.J., Moore, J.H.: Pmlb: a large benchmark suite for machine learning evaluation and comparison. BioData Mining 10(1), 36 (2017)

    Article  Google Scholar 

Download references

Acknowledgment

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arthur C. Gorgônio .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gorgônio, A.C., Barreto, C.A.d.S., da Costa, S.J.M.S., Canuto, A.M.d.P., Vale, K.M.O., Gorgônio, F.L. (2022). An Efficient Drift Detection Module for Semi-supervised Data Classification in Non-stationary Environments. In: Xavier-Junior, J.C., Rios, R.A. (eds) Intelligent Systems. BRACIS 2022. Lecture Notes in Computer Science(), vol 13653. Springer, Cham. https://doi.org/10.1007/978-3-031-21686-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-21686-2_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-21685-5

  • Online ISBN: 978-3-031-21686-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics