Summary
The most challenging applications of knowledge discovery involve dynamic environments where data continuous flow at high-speed and exhibit non-stationary properties. In this chapter we discuss the main challenges and issues when learning from data streams. In this work, we discuss the most relevant issues in knowledge discovery from data streams: incremental learning, cost-performance management, change detection, and novelty detection. We present illustrative algorithms for these learning tasks, and a real-world application illustrating the advantages of stream processing. The chapter ends with some open issues that emerge from this new research area.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aggarwal, C., Han, J., Wang, J., Yu, P.: A framework for clustering evolving data streams. In: Proceedings of Twenty-Ninth International Conference on Very Large Data Bases, pp. 81–92. Morgan Kaufmann, San Francisco (2003)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the 21st Symposium on Principles of Database Systems, pp. 1–16. ACM Press, New York (2002)
Barbará, D.: Requirements for clustering data streams. SIGKDD Explorations 3, 23–27 (2002)
Barbara, D., Chen, P.: Using the fractal dimension to cluster datasets. In: Proc. of the 6th International Conference on Knowledge Discovery and Data Mining, pp. 260–264. ACM Press, New York (2000)
Basseville, M., Nikiforov, I.: Detection of abrupt changes: Theory and applications. Prentice-Hall Inc., Englewood Cliffs (1987)
Bauer, D.F.: Constructing confidence sets using rank statistics. Journal of American Statistical Association, 687–690 (1972)
Cauwenberghs, G., Poggio, T.: Incremental and decremental support vector machine learning. In: Proceedings of the 13th Neural Information Processing Systems (2000)
Craven, M., Shavlik, J.: Using neural networks for data mining. Future Generation Computer Systems 13, 211–229 (1997)
Domingos, P., Hulten, G.: Mining High-Speed Data Streams. In: Proceedings of the ACM Sixth International Conference on Knowledge Discovery and Data Mining, pp. 71–80. ACM Press, New York (2000)
Domingos, P., Hulten, G.: A general method for scaling up machine learning algorithms and its application to clustering. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 106–113. Morgan Kaufmann, San Francisco (2001)
Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning 29, 103–129 (1997)
Ferrer-Troyano, F., Aguilar-Ruiz, J., Riquelme, J.: Incremental rule learning and border examples selection from numerical data streams. Journal of Universal Computer Science 11, 1426–1439 (2005)
Gaber, M.M., Krishnaswamy, S., Zaslavsky, A.: Cost-efficient mining techniques for data streams. In: Proceedings of the second workshop on Australasian information security, pp. 109–114. Australian Computer Society, Inc. (2004)
Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS, vol. 3171, pp. 286–295. Springer, Heidelberg (2004)
Gama, J., Pinto, C.: Discretization from data streams: applications to histograms and data mining. In: SAC, pp. 662–667. ACM Press, New York (2006)
Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 523–528. ACM Press, Washington (2003)
Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.: Mining frequent patterns in data streams at multiple time granularities. In: Next Generation Data Mining. AAAI/MIT (2003)
Gibbons, P.B., Matias, Y.: Synopsis data structures for massive data sets. In: ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 909–910. Society for Industrial and Applied Mathematics (1999)
Guha, S., Harb, B.: Wavelet synopsis for data streams: minimizing non-euclidean error. In: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pp. 88–97. ACM Press, New York (2005)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD 2000: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 1–12. ACM Press, New York (2000)
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the 7th ACM SIGKDD International conference on Knowledge discovery and data mining, pp. 97–106. ACM Press, San Francisco (2001)
Ikonomovska, E., Gama, J.: Learning model trees from data streams. In: Discovery Science, (no prelo). Springer, Heidelberg (2008)
Jin, R., Agrawal, G.: Efficient decision tree construction on streaming data. In: Proceedings of the Ninth International Conference on Knowledge Discovery and Data Mining. ACM Press, New York (2003)
Kifer, D., Ben-David, S., Gehrke, J.: Detecting change in data streams. In: VLDB 2004: Proceedings of the 30th International Conference on Very Large Data Bases, pp. 180–191. Morgan Kaufmann Publishers Inc., San Francisco (2004)
Klinkenberg, R.: Learning drifting concepts: Example selection vs. example weighting. Intelligent Data Analysis 8, 281–300 (2004)
Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD 2003), pp. 2–11 (2003)
Markou, M., Singh, S.: Novelty detection: a review-part 1: neural network based approaches (2003)
O’Callaghan, L., Mishra, N., Meyerson, A., Guha, S., Motwani, R.: Streaming-data algorithms for high-quality clustering. In: Proceedings of IEEE International Conference on Data Engineering. IEEE Press, Los Alamitos (2002)
Rauschenbach, T.: Short-term load forecast using wavelet transformation. Proceeding (362) Artificial Intelligence and Applications (2002)
Rodrigues, P., Gama, J.: A system for analysis and prediction of electricity-load streams. Intelligent Data Analysis 13 (to appear, 2009)
Rodrigues, P., Gama, J., Pedroso, J.: Odac: Hierarchical clustering of time series data streams. In: Proceedings of the Sixth SIAM International Conference on Data Mining, pp. 499–503. Society for Industrial and Applied Mathematics, Bethesda (2006)
Sheikholeslami, G., Chatterjee, S., Zhang, A.: WaveCluster: A multi-resolution clustering approach for very large spatial databases. In: Proceedings of the Twenty-Fourth International Conference on Very Large Data Bases, pp. 428–439. ACM Press, New York (1998)
Sousa, E., Traina, A., Traina, J.C., Faloutsos, C.: Evaluating the intrinsic dimension of evolving data streams. New Generation Computing 25 (2007)
Spinosa, E., Gama, J., Carvalho, A.: Cluster-based novel concept detection in data streams applied to intrusion detection in computer networks. In: Proceedings of the 2008 ACM Symposium on Applied computing, pp. 976–980. ACM Press, New York (2008)
Wald, A.: Sequential analysis. John Wiley and Sons, Chichester (1947)
Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an eficient data clustering method for very large databases. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 103–114. ACM Press, New York (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Gama, J., Rodrigues, P.P. (2009). An Overview on Mining Data Streams. In: Abraham, A., Hassanien, AE., de Leon F. de Carvalho, A.P., Snášel, V. (eds) Foundations of Computational, IntelligenceVolume 6. Studies in Computational Intelligence, vol 206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01091-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-01091-0_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01090-3
Online ISBN: 978-3-642-01091-0
eBook Packages: EngineeringEngineering (R0)