Data Stream Mining

Gaber, Mohamed Medhat; Zaslavsky, Arkady; Krishnaswamy, Shonali

doi:10.1007/978-0-387-09823-4_39

Data Stream Mining

Mohamed Medhat Gaber³,
Arkady Zaslavsky³ &
Shonali Krishnaswamy³

Chapter
First Online: 01 January 2010

17k Accesses
17 Citations

Abstract

Data mining is concerned with the process of computationally extracting hidden knowledge structures represented in models and patterns from large data repositories. It is an interdisciplinary field of study that has its roots in databases, statistics, machine learning, and data visualization. Data mining has emerged as a direct outcome of the data explosion that resulted from the success in database and data warehousing technologies over the past two decades (Fayyad, 1997,Fayyad, 1998,Kantardzic, 2003).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 349.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A. Arasu, B. Babcock. S. Babu, M. Datar, K. Ito, I. Nishizawa, J. Rosenstein, and J. Widom. STREAM: The Stanford Stream Data Manager Demonstration description - short overview of system status and plans, in Proc. of the ACM Intl Conf. on Management of Data (SIGMOD 2003), June 2003, pp. 665 - 665.
Google Scholar
D. Abadi, D. Carney, U. Cetintemel, M. Cherniack, C. Convey, C. Erwin, E. Galvez, M. Hatoun, J. Hwang, A. Maskey, A. Rasin, A. Singer, M. Stonebraker, N. Tatbul, Y. Xing, R.Yan, S. Zdonik. Aurora: A Data Stream Management System (Demonstration). Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’ 03), San Diego, CA, June 2003.
Google Scholar
C. Aggarwal, J. Han, J.Wang, P. S. Yu, A Framework for Clustering Evolving Data Streams, Proc. 2003 Int. Conf. on Very Large Data Bases (VLDB’03), Berlin, Germany, Sept. 2003, pp 81-92.
Google Scholar
C. Aggarwal, J. Han, J. Wang, and P. S. Yu, A Framework for Projected Clustering of High Dimensional Data Streams, Proc. 2004 Int. Conf. on Very Large Data Bases (VLDB’04), Toronto, Canada, Aug. 2004, pp. 852-863.
Google Scholar
C. Aggarwal, J. Han, J. Wang, and P. S. Yu, On Demand Classification of Data Streams, Proc. 2004 Int. Conf. on Knowledge Discovery and Data Mining (KDD’04), Seattle, WA, Aug. 2004, pp. 503-508.
Google Scholar
I.F. Akyildiz,W. Su, Y. Sankarasubramaniam, and E. Cayirci. A Survey on Sensor Networks, IEEE Communication Magazine, August, 2002, pp. 102-114.
Google Scholar
B. Babcock, S. Babu, M. Datar, R. Motwani, and J.Widom. Models and issues in data stream systems, Proceedings of PODS, 2002, pp. 1-16.
Google Scholar
B. Babcock, M. Datar, and R. Motwani. Load Shedding Techniques for Data Stream Systems (short paper), Proc. of the 2003 Workshop on Management and Processing of Data Streams (MPDS 2003), June 2003
Google Scholar
B. Babcock, M. Datar, R. Motwani, L. O’Callaghan, Maintaining Variance and k-Medians over Data Stream Windows, Proceedings of the 22nd Symposium on Principles of Database Systems (PODS 2003), pp. 234 - 243.
Google Scholar
M. Burl, Ch. Fowlkes, J. Roden, A. Stechert, and S. Mukhtar, Diamond Eye: A distributed architecture for image data mining, in SPIE DMKD, Orlando, April 1999, pp. 197-206.
Google Scholar
M. Charikar, L. O’Callaghan, and R. Panigrahy, Better streaming algorithms for clustering problems, Proc. of 35th ACM Symposium on Theory of Computing (STOC), 2003, pp. 30-39.
Google Scholar
Y.D. Cai, D. Clutter, G. Pape, J. Han, M. Welge, and L. Auvil, MAIDS: Mining Alarming Incidents from Data Streams, (system demonstration), Proc. 2004 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’04), Paris, France, June 2004, pp. 919 - 920.
Google Scholar
Y. Chen, G. Dong, J. Han, B.W.Wah, and J.Wang, Multi-Dimensional Regression Analysis of Time-Series Data Streams, Proceedings of VLDB Conference, 2002, pp. 323-334.
Google Scholar
B. Castano, M. Judd, R. C. Anderson, and T. Estlin, Machine Learning Challenges in Mars Rover Traverse Science, Proc. of the ICML 2003 workshop on Machine Learning Technologies for Autonomous Space Applications.
Google Scholar
C. Cranor , Johnson, T., Spataschek, O., and Shkapenyuk, V., Gigascope: a stream database for network applications, In Proceedings of the 2003 ACM SIGMOD international Conference on Management of Data (San Diego, California, June 09 - 12, 2003). SIGMOD ’03. ACM, New York, NY, 647-651
Google Scholar
L. O’Callaghan, Nina Mishra, Adam Meyerson, Sudipto Guha, and Rajeev Motwani, Streaming-data algorithms for high-quality clustering, Proceedings of IEEE Interna784 Mohamed Medhat Gaber, Arkady Zaslavsky, and Shonali Krishnaswamy tional Conference on Data Engineering, March 2002, pp. 685-697.
Google Scholar
G. Cormode, S. Muthukrishnan, What’s hot and what’s not: tracking most frequent items dynamically, PODS 2003, pp. 296-306
Google Scholar
J. Coughlan, Accelerating Scientific Discovery at NASA, SIAM SDM 2004, Florida USA.
Google Scholar
G. Cormode and S. Muthukrishnan., What is new: Finding significant differences in network data streams, INFOCOM 2004.
Google Scholar
Y. Chi, Philip S. Yu, Haixun Wang, Richard R. Muntz, Loadstar: A Load Shedding Scheme for Classifying Data Streams, The 2005 SIAM International Conference on Data Mining (SIAM SDM’05), 2005.
Google Scholar
G. Dong, J. Han, L.V.S. Lakshmanan, J. Pei, H.Wang and P.S. Yu. Online mining of changes from data streams: Research problems and preliminary results, Proceedings of the 2003 ACM SIGMOD Workshop on Management and Processing of Data Streams. In cooperation with the 2003 ACM-SIGMOD International Conference on Management of Data (SIGMOD’03), San Diego, CA, June 8, 2003.
Google Scholar
P. Domingos and G. Hulten, Mining High-Speed Data Streams, In Proceedings of the Association for Computing Machinery Sixth International Conference on Knowledge Discovery and Data Mining, 2000, pp. 71-80
Google Scholar
P. Domingos and G. Hulten. Catching Up with the Data: Research Issues in Mining Data Streams,Workshop on Research Issues in Data Mining and Knowledge Discovery, 2001. Santa Barbara, CA
Google Scholar
P. Domingos and G. Hulten, A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering, Proceedings of the Eighteenth International Conference on Machine Learning, 2001, Williamstown, MA, Morgan Kaufmann, pp. 106-113.
Google Scholar
M. Dunham. Data Mining: Introductory and Advanced Topics. Pearson Education, 2003.
Google Scholar
F.J. Ferrer-Troyano, J.S. Aguilar-Ruiz and J.C. Riquelme, Discovering Decision Rules from Numerical Data Streams, ACM Symposium on Applied Computing - SAC04, 2004, ACM Press, pp. 649-653.
Google Scholar
U.M. Fayyad: Knowledge Discovery in Databases: An Overview. ILP 1997, pp. 3-16
Google Scholar
U.M. Fayyad: Mining Databases: Towards Algorithms for Knowledge Discovery. IEEE Data Eng. Bull. 21(1), 1998 pp. 39-48.
Google Scholar
U.M. Fayyad, Georges G. Grinstein, AndreasWierse: Information Visualization in Data Mining and Knowledge Discovery Morgan Kaufmann 2001.
Google Scholar
M.M. Gaber , Yu P. S., A Holistic Approach for Resource-aware Adaptive Data Stream Mining, Journal of New Generation Computing, Special Issue on Knowledge Discovery from Data Streams, 2006.
Google Scholar
V. Ganti, Johannes Gehrke, Raghu Ramakrishnan: Mining Data Streams under Block Evolution. SIGKDD Explorations 3(2), 1002 pp. 1-10.
Google Scholar
M. Garofalakis, Johannes Gehrke, Rajeev Rastogi: Querying and mining data streams: you only get one look a tutorial. SIGMOD Conference 2002: 635
Google Scholar
C. Giannella, J. Han, J. Pei, X. Yan, and P.S. Yu, Mining Frequent Patterns in Data Streams at Multiple Time Granularities, in H. Kargupta, A. Joshi, K. Sivakumar, and Y. Yesha (eds.), Next Generation Data Mining, AAAI/MIT, 2003.
Google Scholar
A.C. Gilbert, Yannis Kotidis, S. Muthukrishnan, Martin Strauss: One-Pass Wavelet Decompositions of Data Streams. TKDE 15(3), 2003, pp. 541-554.
Google Scholar
M.M. Gaber, Krishnaswamy, S., and Zaslavsky, A., On-board Mining of Data Streams in Sensor Networks, a book chapter in Advanced Methods of Knowledge Discovery from Complex Data, (Eds.) Sanghamitra Badhyopadhyay, Ujjwal Maulik, Lawrence Holder and Diane Cook, Springer Verlag,.2005.
Google Scholar
R. Grossman, Supporting the Data Mining Process with Next Generation DataMining Systems, Enterprise Systems, August 1998
Google Scholar
M.M. Gaber, Zaslavsky, A., and Krishnaswamy, S., Towards an Adaptive Approach for Mining Data Streams in Resource Constrained Environments, Proceedings of Sixth International Conference on Data Warehousing and Knowledge Discovery - Industry Track (DaWaK 2004), Zaragoza, Spain, 30 August - 3 September, Lecture Notes in Computer Science (LNCS), Springer Verlag.
Google Scholar
S. Guha, N. Mishra, R. Motwani, and L. O’Callaghan, Clustering data streams, Proceedings of the Annual Symposium on Foundations of Computer Science. IEEE, November 2000, pp. 359-366.
Google Scholar
S. Guha, Adam Meyerson, Nina Mishra, Rajeev Motwani, and Liadan O’Callaghan, Clustering Data Streams: Theory and Practice TKDE special issue on clustering, vol. 15, 2003, pp. 515-528.
Google Scholar
D.J. Hand, Statistics and Data Mining: Intersecting Disciplines, ACM SIGKDD Explorations, 1, 1, June 1999, pp. 16-19.
Google Scholar
D.J. Hand, Mannila H., and Smyth P. Principles of data mining, MIT Press, 2001.
Google Scholar
W. Hoeffding. Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association (58), 1963, pp. 13-30.
Google Scholar
J. Han, Pei, J., and Yin, Y, Mining frequent patterns without candidate generation, In Proc. 2000 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’00), pp. 1-12.
Google Scholar
G. Hulten, L. Spencer, and P. Domingos. Mining Time-Changing Data Streams. ACM SIGKDD 2001, pp. 97-106.
Google Scholar
M. Henzinger, P. Raghavan and S. Rajagopalan, Computing on data streams , Technical Note 1998-011, Digital Systems Research Center, Palo Alto, CA, May 1998
Google Scholar
T. Hastie, R. Tibshirani, J. Friedman, The elements of statistical learning: data mining, inference, and prediction, New York: Springer, 2001
MATH Google Scholar
P. Indyk, N. Koudas, and S. Muthukrishnan, Identifying Representative Trends in Massive Time Series Data Sets Using Sketches. In Proc. of the 26th Int. Conf. on Very Large Data Bases, Cairo, Egypt, September 2000, pp. 363 - 372.
Google Scholar
C. Jin, Weining Qian, Chaofeng Sha, Jeffrey X. Yu, and Aoying Zhou, Dynamically Maintaining Frequent Items over a Data Stream, In Proceedings of the 12th ACM Conference on Information and Knowledge Management (CIKM’2003), pp. 287-294
Google Scholar
M. Kantardzic, Data mining : concepts, models, methods and algorithms, Piscataway, NJ: IEEE Pr. Wiley Interscience, 2003.
Google Scholar
H. Kargupta, Ruchita Bhargava, Kun Liu, Michael Powers, Patrick Blair, Samuel Bushra, James Dull, Kakali Sarkar, Martin Klein, Mitesh Vasa, and David Handy, VEDAS: A Mobile and Distributed Data Stream Mining System for Real-Time Vehicle Monitoring, Proceedings of SIAM International Conference on Data Mining 2004.
Google Scholar
S. Krishnamurthy, S. Chandrasekaran, O. Cooper, A. Deshpande, M. Franklin, J. Hellerstein, W. Hong, S. Madden, V. Raman, F. Reiss, and M. Shah. TelegraphCQ: An Architectural Status Report. IEEE Data Engineering Bulletin, Vol 26(1), March 2003.
Google Scholar
E. Keogh, J. Lin, and W. Truppel. Clustering of Time Series Subsequences is Meaningless: Implications for Past and Future Research. In proceedings of the 3rd IEEE International Conference on Data Mining. Melbourne, FL. Nov 19-22, 2003, pp. 115-122.
Google Scholar
H. Kargupta, Park, B., Pittie, S., Liu, L., Kushraj, D. and Sarkar, K. (2002). MobiMine: Monitoring the Stock Market from a PDA. ACM SIGKDD Explorations. January 2002. Volume 3, Issue 2, ACM Press, pp. 37-46.
Google Scholar
B. Krishnamachari and S.S. Iyengar. Efficient and Fault-tolerant Feature Extraction in Sensor Networks. In Proceedings of the 2nd International Workshop on Information Processing 786 Mohamed Medhat Gaber, Arkady Zaslavsky, and Shonali Krishnaswamy in Sensor Networks (IPSN ’03), Palo Alto, California, April 2003.
Google Scholar
B. Krishnamachari and S. Iyengar. Distributed Bayesian Algorithms for Fault-tolerant Event Region Detection in Wireless Sensor Networks. IEEE Transactions on Computers, vol. 53, No. 3, March 2004.
Google Scholar
M. Last, Online Classification of Nonstationary Data Streams, Intelligent Data Analysis, Vol. 6, No. 2, 2002, pp. 129-147.
Google Scholar
Y. Law, C. Zaniolo, An Adaptive Nearest Neighbor Classification Algorithm for Data Streams, Proceedings of the 9th European Conference on the Principals and Practice of Knowledge Discovery in Databases (PKDD 2005), Springer Verlag, Porto, Portugal, October 3-7, 2005, pp. 108-120.
Google Scholar
J. Lin, E. Keogh, S. Lonardi, and B. Chiu, A Symbolic Representation of Time Series, with Implications for Streaming Algorithms, In proceedings of the 8th ACM SIGMODWorkshop on Research Issues in Data Mining and Knowledge Discovery. San Diego, CA. June 13, 2003, pp. 2-11.
Google Scholar
G.S. Manku and R. Motwani. Approximate frequency counts over data streams. In Proceedings of the 28th International Conference on Very Large Data Bases, Hong Kong, China, August 2002, pp. 346-357.
Google Scholar
R. Moskovitch, Y. Elovici, L. Rokach, Detection of unknown computer worms based on behavioral classification of the host, Computational Statistics and Data Analysis, 52(9):4544–4566, 2008.
Article MATH MathSciNet Google Scholar
S. Muthukrishnan, Data streams: algorithms and applications. Proceedings of the fourteenth annual ACM-SIAM symposium on discrete algorithms, 2003.
Google Scholar
O. Nasraoui , Cardona C., Rojas C., and Gonzalez F., Mining Evolving User Profiles in Noisy Web Clickstream Data with a Scalable Immune System Clustering Algorithm, in Proc. of WebKDD 2003 - KDD Workshop on Web mining as a Premise to Effective and Intelligent Web Applications, Washington DC, August 2003, p. 71
Google Scholar
C. Ordonez. Clustering Binary Data Streams with K-means ACM DMKD 2003.
Google Scholar
B. Park and H. Kargupta. Distributed Data Mining: Algorithms, Systems, and Applications, Data Mining Handbook. Editor: Nong Ye. 2002.
Google Scholar
E. Perlman and A. Java, Predictive Mining of Time Series Data in Astronomy. In ASP Conf. Ser. 295: Astronomical Data Analysis Software and Systems XII, 2003.
Google Scholar
S. Papadimitriou, C. Faloutsos, and A. Brockwell, Adaptive, Hands-Off Stream Mining, 29th International Conference on Very Large Data Bases VLDB, 2003.
Google Scholar
S. Pirttikangas, J. Riekki, J. Kaartinen, J. Miettinen, S. Nissila, J. Roning. Genie Of The Net: A New Approach For A Context-Aware Health Club. In Proceedings of Joint 12th ECML’01 and 5th European Conference on PKDD’01. September 3-7, 2001, Freiburg, Germany.
Google Scholar
L. Rokach, Decomposition methodology for classification tasks: a meta decomposer framework, Pattern Analysis and Applications, 9(2006):257–271.
Article MathSciNet Google Scholar
L. Rokach, O. Maimon and R. Arbel, Selective voting-getting more for less in sensor fusion, International Journal of Pattern Recognition and Artificial Intelligence 20 (3) (2006), pp. 329–350.
Article Google Scholar
A. Srivastava and J. Stroeve, Onboard Detection of Snow, Ice, Clouds and Other Geophysical Processes Using Kernel Methods, Proceedings of the ICML’03 workshop on Machine Learning Technologies for Autonomous Space Applications.
Google Scholar
S. Tanner, M. Alshayeb, E. Criswell, M. Iyer, A. McDowell, M. McEniry, K. Regner, EVE: On-Board Process Planning and Execution, Earth Science Technology Conference, Pasadena, CA, Jun. 11 - 14, 2002.
Google Scholar
N. Tatbul, U. Cetintemel, S. Zdonik, M. Cherniack and M. Stonebraker, Load Shedding in a Data Stream Manager Proceedings of the 29th International Conference on Very Large Data Bases (VLDB), September, 2003.
Google Scholar
N. Tatbul, U. Cetintemel, S. Zdonik, M. Cherniack, M. Stonebraker. Load Shedding on Data Streams, In Proceedings of the Workshop on Management and Processing of Data Streams (MPDS 03), San Diego, CA, USA, June 8, 2003.
Google Scholar
H. Toivonen, Sampling large databases for association rules, Proceeding of VLDB Conference, 1996
Google Scholar
Y. Yao, J. E. Gehrke, The Cougar Approach to In-Network Query Processing in Sensor Networks, SIGMOD Record, Volume 31, Number 3. September 2002, pp. 9-18.
Google Scholar
H. Wang, W. Fan, P. Yu and J. Han, Mining Concept-Drifting Data Streams using Ensemble Classifiers, in the 9th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Aug. 2003, Washington DC, USA.
Google Scholar
Y. Zhu and D. Shasha, Efficient Elastic Burst Detection in Data Streams, The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD-2003 24 August 2003 - 27 August 2003, pp 336 - 345.
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Distributed Systems and Software Engineering, Monash University, Victoria, 3800, Australia
Mohamed Medhat Gaber, Arkady Zaslavsky & Shonali Krishnaswamy

Authors

Mohamed Medhat Gaber
View author publications
You can also search for this author in PubMed Google Scholar
Arkady Zaslavsky
View author publications
You can also search for this author in PubMed Google Scholar
Shonali Krishnaswamy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shonali Krishnaswamy .

Editor information

Editors and Affiliations

, Dept. Industrial Engineering, Tel Aviv University, Ramat Aviv, 69978, Israel
Oded Maimon
, Dept. Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, 84105, Israel
Lior Rokach

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Gaber, M.M., Zaslavsky, A., Krishnaswamy, S. (2009). Data Stream Mining. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09823-4_39

Download citation

DOI: https://doi.org/10.1007/978-0-387-09823-4_39
Published: 07 July 2010
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-09822-7
Online ISBN: 978-0-387-09823-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics