Summary
Significant applications require data stream mining algorithms to run in resource-constrained environments. Thus, adaptation is a key process to ensure the consistency and continuity of the running algorithms. This chapter provides a theoretical framework for applying the granularity-based approach in mining data streams. Our Algorithm Output Granularity (AOG) is explained in details providing practitioners the ability to use it for enabling resource-awareness and adaptability for their algorithms. Theoretically, AOG has been formalized using the Probably Approximately Correct (PAC) learning model allowing researchers to formalize the adaptability of their techniques. Finally, the integration of AOG with other adaptation strategies is provided.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Babcock, B., Datar, M., Motwani, R.: Load Shedding Techniques for Data Stream Systems (short paper). In: Proc. of the 2003 Workshop on Management and Processing of Data Streams (MPDS 2003) (June 2003)
Bhargava, R., Kargupta, H., Powers, M.: Energy Consumption in Data Analysis for On-board and Distributed Applications. In: Proceedings of the ICML 2003 workshop on Machine Learning Technologies for Autonomous Space Applications (2003)
Chi, Y., Yu, P.S., Wang, H., Muntz, R.R.: Loadstar: A Load Shedding Scheme for Classifying Data Streams. In: The 2005 SIAM International Conference on Data Mining (SIAM SDM 2005) (2005)
Coughlan, J.: Accelerating Scientific Discovery at NASA. In: Jonker, W., Petković, M. (eds.) SDM 2004. LNCS, vol. 3178. Springer, Heidelberg (2004)
Domingos, P., Hulten, G.: A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering. In: Proceedings of the Eighteenth International Conference on Machine Learning, 2001, pp. 106–113. Morgan Kaufmann, Williamstown (2001)
Domingos, P., Hulten, G.: Mining High-Speed Data Streams. In: Proceedings of the Association for Computing Machinery Sixth International Conference on Knowledge Discovery and Data Mining, pp. 71–80 (2000)
Gaber, M.M., Krishnaswamy, S., Zaslavsky, A.: On-board Mining of Data Streams in Sensor Networks. In: Badhyopadhyay, S., Maulik, U., Holder, L., Cook, D. (eds.) Advanced Methods of Knowledge Discovery from Complex Data, pp. 307–336. Springer, Heidelberg (2005) (forthcoming)
Gaber, M.M., Yu, P.S.: A Holistic Approach for Resource-aware Adaptive Data Stream Mining. Journal of New Generation Computing, Special Issue on Knowledge Discovery from Data Streams (2006)
Gaber, M.M., Krishnaswamy, M., Zaslavsky, S.: Resource- Aware Mining of Data Streams. In: Aguilar-Ruiz, J.S., Gama, J. (eds.) Journal of Universal Computer Science, Special Issue on Knowledge Discovery in Data Streams, pp. 1440–1453 (August 2005)
Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: Mining Data Streams: A Review. ACM SIGMOD Record 34(1) (June 2005) ISSN: 0163-5808
Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association (58), 13–30 (1963)
Muthukrishnan, S.: Data streams: algorithms and applications. In: Proceedings of the fourteenth annual ACM-SIAM symposium on discrete algorithms (2003)
Natarajan, B.K.: Machine learning: a theoretical approach. M. Kaufmann, San Mateo (1991)
Park, B.-H., Ostrouchov, G., Samatova, N.F., Geist, A.: Reservoir-Based Random Sampling with Replacement from Data Stream. In: Proceedings of SIAM International Conference on Data Mining 2004 (2004)
Roiger, R., Geatz, M.: Data mining: a tutorial-based primer. Addison Wesley, Boston (2003)
Sipser, M.: Introduction to the Theory of Computation. In: Part Two: Computability Theory, chs. 3-6, pp. 123–222. PWS Publishing (1997) ISBN 0-534-94728-X
Tatbul, N., Cetintemel, U., Zdonik, S., Cherniack, M., Stonebraker, M.: Load Shedding in a Data Stream Manager. In: Proceedings of the 29th International Conference on Very Large Data Bases, VLDB (September 2003)
Tatbul, N., Cetintemel, U., Zdonik, S., Cherniack, M., Stonebraker, M.: Load Shedding on Data Streams. In: Proceedings of the Workshop on Management and Processing of Data Streams (MPDS 2003), San Diego, CA, USA, June 8 (2003)
Shah, R., Krishnaswamy, S., Gaber, M.M.: Resource-Aware Very Fast K-Means for Ubiquitous Data Stream Mining. In: Proceedings of Second International Workshop on Knowledge Discovery in Data Streams, to be held in conjunction with the 16th European Conference on Machine Learning (ECML 2005) and the 9th European Conference on the Principals and Practice of Knowledge Discovery in Databases (PKDD 2005), Porto, Portugal, October 3-7 (2005)
Valiant, L.G.: A theory of the learnable. Communications of the ACM 27(11), 1134–1142 (1984)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Gaber, M.M. (2009). Data Stream Mining Using Granularity-Based Approach. In: Abraham, A., Hassanien, AE., de Leon F. de Carvalho, A.P., Snášel, V. (eds) Foundations of Computational, IntelligenceVolume 6. Studies in Computational Intelligence, vol 206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01091-0_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-01091-0_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01090-3
Online ISBN: 978-3-642-01091-0
eBook Packages: EngineeringEngineering (R0)