Skip to main content

On Handling Missing Values in Data Stream Mining Algorithms Based on the Restricted Boltzmann Machine

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2019)

Abstract

This paper addresses the issue of data stream mining using the Restricted Boltzmann Machine (RBM). Recently, it was demonstrated that the RBM can be useful as a concept drift detector in data streams with time-changing probability density. In this paper, we consider another problem which often occurs in real-life data streams, i.e. incomplete data. We propose two modifications of the RBM learning algorithms to make them able to handle missing values. The first one inserts an additional procedure before the positive phase of the Contrastive Divergence. This procedure aims at inferring the missing values in the visible layer by performing a fixed number of Gibbs steps. The second modification introduces dimension-dependent sizes of minibatches in the stochastic gradient descent method. The proposed methods are verified experimentally, demonstrating their usability for concept drift detection in data streams with incomplete data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Andrieu, C., de Freitas, N., Doucet, A., Jordan, M.I.: An introduction to MCMC for machine learning. Mach. Learn. 50(1), 5–43 (2003)

    Article  Google Scholar 

  2. Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)

    Article  MathSciNet  Google Scholar 

  3. Devi, V.S., Meena, L.: Parallel MCNN (PMCNN) with application to prototype selection on large and streaming data. J. Artif. Intell. Soft Comput. Res. 7(3), 155–169 (2017)

    Article  Google Scholar 

  4. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016). http://www.deeplearningbook.org

    MATH  Google Scholar 

  5. Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14(8), 1771–1800 (2002)

    Article  Google Scholar 

  6. Hinton, G.E.: A practical guide to training restricted Boltzmann machines. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 599–619. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_32

    Chapter  Google Scholar 

  7. Hinton, G.E., Sejnowski, T.J., Ackley, D.H.: Boltzmann machines: Constraint satisfaction networks that learn. Technical report CMU-CS-84-119, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA (1984)

    Google Scholar 

  8. Jaworski, M., Duda, P., Rutkowski, L.: On applying the restricted Boltzmann machine to active concept drift detection. In: Proceedings of the 2017 IEEE Symposium Series on Computational Intelligence, Honolulu, USA, pp. 3512–3519 (2017)

    Google Scholar 

  9. Jaworski, M., Duda, P., Rutkowski, L.: Concept drift detection in streams of labelled data using the restricted Boltzmann machine. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–7 (2018)

    Google Scholar 

  10. Jaworski, M., Duda, P., Rutkowski, L.: New splitting criteria for decision trees in stationary data streams. IEEE Trans. Neural Netw. Learn. Syst. 29(6), 2516–2529 (2018)

    Article  MathSciNet  Google Scholar 

  11. Jaworski, M., Rutkowski, L., Duda, P., Cader, A.: Resource-aware data stream mining using the restricted Boltzmann machine. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds.) ICAISC 2019. LNCS (LNAI), vol. 11509, pp. 384–396. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20915-5_35

    Chapter  Google Scholar 

  12. Krawczyk, B., Cano, A.: Online ensemble learning with abstaining classifiers for drifting and noisy data streams. Appl. Soft Comput. 68, 677–692 (2018)

    Article  Google Scholar 

  13. LeCun, Y., Cortes, C.: MNIST handwritten digit database (2010). http://yann.lecun.com/exdb/mnist/

  14. Lemaire, V., Salperwyck, C., Bondu, A.: A survey on supervised classification on data streams. In: Zimányi, E., Kutsche, R.-D. (eds.) eBISS 2014. LNBIP, vol. 205, pp. 88–125. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17551-5_4

    Chapter  Google Scholar 

  15. Ramirez-Gallego, S., Krawczyk, B., García, S., Woźniak, M., Herrera, F.: A survey on data preprocessing for data stream mining: current status and future directions. Neurocomputing 239, 39–57 (2017)

    Article  Google Scholar 

  16. Rutkowski, L., Jaworski, M., Duda, P.: Stream Data Mining: Algorithms and Their Probabilistic Properties. SBD, vol. 56. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-13962-9

    Book  MATH  Google Scholar 

  17. Zliobaite, I., Bifet, A., Pfahringer, B., Holmes, G.: Active learning with drifting streaming data. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 27–39 (2014)

    Article  Google Scholar 

Download references

Acknowledgments

The project financed under the program of the Minister of Science and Higher Education under the name “Regional Initiative of Excellence” in the years 2019–2022 project number 020/RID/2018/19, the amount of financing 12,000,000 PLN. This work was also supported by the Polish National Science Centre under grant no. 2017/27/B/ST6/02852.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maciej Jaworski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jaworski, M., Duda, P., Rutkowska, D., Rutkowski, L. (2019). On Handling Missing Values in Data Stream Mining Algorithms Based on the Restricted Boltzmann Machine. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Communications in Computer and Information Science, vol 1143. Springer, Cham. https://doi.org/10.1007/978-3-030-36802-9_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-36802-9_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-36801-2

  • Online ISBN: 978-3-030-36802-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics