Skip to main content

Monitoring Incremental Histogram Distribution for Change Detection in Data Streams

  • Conference paper
Knowledge Discovery from Sensor Data (Sensor-KDD 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5840))

Included in the following conference series:

Abstract

Histograms are a common technique for density estimation and they have been widely used as a tool in exploratory data analysis. Learning histograms from static and stationary data is a well known topic. Nevertheless, very few works discuss this problem when we have a continuous flow of data generated from dynamic environments.

The scope of this paper is to detect changes from high-speed time-changing data streams. To address this problem, we construct histograms able to process examples once at the rate they arrive. The main goal of this work is continuously maintain a histogram consistent with the current status of the nature. We study strategies to detect changes in the distribution generating examples, and adapt the histogram to the most recent data by forgetting outdated data. We use the Partition Incremental Discretization algorithm that was designed to learn histograms from high-speed data streams.

We present a method to detect whenever a change in the distribution generating examples occurs. The base idea consists of monitoring distributions from two different time windows: the reference window, reflecting the distribution observed in the past; and the current window which receives the most recent data. The current window is cumulative and can have a fixed or an adaptive step depending on the distance between distributions. We compared both distributions using Kullback-Leibler divergence, defining a threshold for change detection decision based on the asymmetry of this measure.

We evaluated our algorithm with controlled artificial data sets and compare the proposed approach with nonparametric tests. We also present results with real word data sets from industrial and medical domains. Those results suggest that an adaptive window’s step exhibit high probability in change detection and faster detection rates, with few false positives alarms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ayres-de-Campos, D., Sousa, P., Costa, A., Bernardes, J.: Omniview-SisPorto® 3.5 - a central fetal monitoring station with online alerts based on computerized cardiotocogram+ST event analysis. J. Perinat. Med. 36, 260–264 (2008)

    Article  Google Scholar 

  2. Ayres-de-Campos, D., Bernardes, J., Garrido, A., Marques-de-Sá, J., Pereira-Leite, L.: SisPorto 2.0: a program for automated analysis of cardiotocograms. J. Matern Fetal Med. 9, 311–318 (2000)

    Article  Google Scholar 

  3. Barbará, D.: Requirements for clustering data streams. SIGKDD Explorations (Special Issue on Online, Interactive and Anytime Data Mining) 3(2), 23–27 (2002)

    Google Scholar 

  4. Berthold, M., Hand, D.: Intelligent Data Analysis - An Introduction. Springer, Heidelberg (1999)

    MATH  Google Scholar 

  5. Breiman, L., et al.: Classification and Regression Trees. Chapman & Hall, Boca Raton (1993)

    Google Scholar 

  6. Bock, R.K., Savicky, P.: MAGIC Gamma Telescope Benchmark (2007), http://archive.ics.uci.edu/ml/datasets.html

  7. Chabert, M., Ruiz, D., Tourneret, J.-Y.: Optimal wavelet for abrupt change detection in multiplicative noise. In: IEEE International Conference on Acoustics Speech and Signal Processing, May 2004, pp. 1089–1092 (2004)

    Google Scholar 

  8. Cormode, G., Garofalakis, M.: Sketching probabilistic data streams. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, June 2007, pp. 281–292 (2007)

    Google Scholar 

  9. Correa, M., de J. Ramírez, M., Bielza, C., Pamies, J., Alique, J.R.: Prediction of surface quality using probabilistic models. In: 7th Congress of the Colombian Association of Automatic, Cali, Colombia, March 21–24 (2007) (in Spanish)

    Google Scholar 

  10. Dasu, T., Krishnan, S., Venkatasubramanian, S., Yi, K.: An Information-Theoretic Approach to Detecting Changes in Multi-Dimensional Data Streams. In: Interface 2006, Pasadena, CA Report (2006)

    Google Scholar 

  11. Gama, J., Medas, P., Castillo, G., Rodrigues, P.P.: Learning with Drift Detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004)

    Google Scholar 

  12. Gama, J., Pinto, C.: Discretization from Data Streams: applications to Histograms and Data Mining. In: Proceedings of the 2006 ACM Symposium on Applied Computing, pp. 662–667 (2006)

    Google Scholar 

  13. Gonçalves, H., Bernardes, J., Paula Rocha, J.A., et al.: Linear and nonlinear analysis of heart rate patterns associated with fetal behavioral states in the antepartum period. Early Human Development 83(9), 585–591 (2007)

    Article  Google Scholar 

  14. Guha, S., Koudas, N., Woo, J.: REHIST: Relative error histogram construction algorithms. In: Proceedings of the VLDB Conference, pp. 300–311 (2004)

    Google Scholar 

  15. Guha, S., Harb, B.: Wavelet synopsis for data streams: minimizing non-euclidean error. In: Proceedings of the eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, August 2005, pp. 88–97 (2005)

    Google Scholar 

  16. Guha, S., Koudas, N., Shim, K.: Approximation and streaming algorithms for histogram construction problems. ACM Transactions on Database Systems (TODS) 31(1), 396–438 (2006)

    Article  Google Scholar 

  17. Hulten, G., Spencer, L., Domingos, P.: Mining Time-Changing Data Streams. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ACM SIGKDD 2001, pp. 97–106. ACM Press, New York (2001)

    Chapter  Google Scholar 

  18. Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K.C., Suel, T.: Optimal Histograms with Quality Guarantees. In: Proc. of the VLDB Conference, pp. 275–286 (1998)

    Google Scholar 

  19. Karras, P., Sacharidis, D., Mamoulis, N.: Exploiting Duality in Summarization with Deterministic Guarantees. Approximating a Data Stream for Querying and Estimation: Algorithms and Performance Evaluation. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 380–389 (2007)

    Google Scholar 

  20. Kifer, D., Ben-David, S., Gehrke, J.: Detecting change in data streams. In: VLDB 2004: Proceedings of the 30th International Conference on Very Large Data Bases, pp. 180–191. Morgan Kaufmann Publishers Inc., San Francisco (2004)

    Google Scholar 

  21. Klinkenberg, R., Renz, I.: Adaptive information filtering: Learning in the presence of concept drifts. In: Learning for Text Categorization, pp. 33–40. AAAI Press, Menlo Park (1998)

    Google Scholar 

  22. Klinkenberg, R., Joachims, T.: Detecting concept drift with support vector machines. In: Proceedings of ICML 2000, 17th International Conference on Machine Learning, Stanford, US, pp. 487–494. Morgan Kaufmann Publishers, San Francisco (2000)

    Google Scholar 

  23. Klinkenberg, R.: Learning drifting concepts: Example selection vs. example weighting. Intelligent Data Analysis 8(3), 281–300 (2004)

    Google Scholar 

  24. Maloof, M., Michalski, R.: Selecting examples for partial memory learning. Machine Learning 41, 27–52 (2000)

    Article  Google Scholar 

  25. Pestana, D.D., Velosa, S.F.: Introdução à Probabilidade e à Estatìstica. Fundação Calouste Gulbenkian (2002)

    Google Scholar 

  26. Pinto, C., Gama, J.: Incremental discretization, application to data with concept drift. In: Proceedings of the 2007 ACM Symposium on Applied Computing, March 2007, pp. 467–468 (2007)

    Google Scholar 

  27. Sebastião, R., Gama, J.: Change Detection in Learning Histograms from Data Streams. In: Proceedings of Portuguese Conference on Artificial Intelligence, Guimarães, Portugal (December 2007)

    Google Scholar 

  28. Spinosa, E.J., Carvalho, A., Gama, J.: OLINDDA: A cluster-based approach for detecting novelty and concept drift in data streams. In: Proceedings of the 2007 ACM Symposium on Applied Computing, March 2007, pp. 448–452 (2007)

    Google Scholar 

  29. Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Machine Learning 23, 69–101 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sebastião, R., Gama, J., Rodrigues, P.P., Bernardes, J. (2010). Monitoring Incremental Histogram Distribution for Change Detection in Data Streams. In: Gaber, M.M., Vatsavai, R.R., Omitaomu, O.A., Gama, J., Chawla, N.V., Ganguly, A.R. (eds) Knowledge Discovery from Sensor Data. Sensor-KDD 2008. Lecture Notes in Computer Science, vol 5840. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12519-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12519-5_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12518-8

  • Online ISBN: 978-3-642-12519-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics