Skip to main content
Log in

CPU load shedding for binary stream joins

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

We present an adaptive load shedding approach for windowed stream joins. In contrast to the conventional approach of dropping tuples from the input streams, we explore the concept ofselective processing for load shedding. We allow stream tuples to be stored in the windows and shed excessive CPU load by performing the join operations, not on the entire set of tuples within the windows, but on a dynamically changing subset of tuples that are learned to be highly beneficial. We support such dynamic selective processing through three forms of runtimeadaptations: adaptation to input stream rates, adaptation to time correlation between the streams and adaptation to join directions. Our load shedding approach enables us to integrateutility-based load shedding withtime correlation-based load shedding. Indexes are used to further speed up the execution of stream joins. Experiments are conducted to evaluate our adaptive load shedding in terms of output rate and utility. The results show that our selective processing approach to load shedding is very effective and significantly outperforms the approach that drops tuples from the input streams.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Arasu A, Babcock B, Babu S, Datar M, Ito K, Motwani R, Nishizawa I, Srivastava U, Thomas D, Varma R, Widom J (2003) STREAM: the Stanford stream data manager. IEEE Data Eng Bull 26(1):19–26

  2. Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data stream systems. In: Proceedings of the ACM symposium on principles of database systems

  3. Babcock B, Babu S, Motwani R, Datar M (2003) Chain: operator scheduling for memory minimization in data stream systems. In: Proceedings of the ACM international conference on management of data

  4. Babcock B, Datar M, Motwani R (2004) Load shedding for aggregation queries over data streams. In: Proceedings of the IEEE international conference on data engineering

  5. Balakrishnan H, Balazinska M, Carney D, Cetintemel U, Cherniack M, Convey C, Galvez E, Salz J, Stonebraker M, Tatbul N, Tibbetts R, Zdonik S (2004) Retrospective on Aurora. VLDB J Special Issue Data Stream Process 13(4):370–383

    Google Scholar 

  6. Brogan WL (1990) Modern control theory, 3rd edn. Prentice Hall

  7. Carney D, Cetintemel U, Cherniack M, Convey C, Lee S, Seidman G, Stonebraker M, Tatbul N, Zdonik S (2002) Monitoring streams: a new class of data management applications. In: Proceedings of the international conference on very large data bases

  8. Chandrasekaran S, Cooper O, Deshpande A, Franklin MJ, Hellerstein JM, Hong W, Krishnamurthy S, Madden SR, Raman V, Reiss F, Shah MA (2003) TelegraphCQ: Continuous dataflow processing for an uncertain world. In: Proceedings of the conference on innovative database research

  9. Chandrasekaran S, Franklin MJ (2004) Remembrance of streams past: overload-sensitive management of archived streams. In: Proceedings of the international conference on very large databases

  10. Das A, Gehrke J, Riedewald M (2003) Approximate join processing over data streams. In: Proceedings of the ACM international conference on management of data

  11. Ghanem TM, Hammad MA, Mokbel MF, Aref WG, Elmagarmid AK (2005) Query processing using negative tuples in stream query engines. Purdue University technical report CSD TR# 04-040

  12. Golab L, Garg S, Ozsu MT (2004) On indexing sliding windows over online data streams. In: Proceedings of the international conference on extending database technology

  13. Golab L, Ozsu MT (2003) Processing sliding window multi-joins in continuous queries over data streams. In: Proceedings of the international conference on very large data bases

  14. Hammad MA, Aref WG (2003) Stream window join: tracking moving objects in sensor-network databases. In: Proceedings of the international conference on scientific and statistical database management

  15. Helmer S, Westmann T, Moerkotte G (1998) Diag-Join: An opportunistic join algorithm for 1:N relationships. In: Proceedings of the international conference on very large data bases

  16. Mamoulis N (2003) Efficient processing of joins on set-valued attributes. In: Proceedings of the ACM international conference on management of data

  17. Motwani R, Widom J, Arasu A, Babcock B, Babu S, Datar M, Manku G, Olston C, Rosenstein J, Varma R (2003) Query processing, resource management, and approximation in a data stream management system. In: Proceedings of the conference on innovative database research

  18. Kang J, Naughton J, Viglas S (2003) Evaluating window joins over unbounded streams. In: Proceedings of the IEEE international conference on data engineering

  19. Kleinberg J (2002) Bursty and hierarchical structure in streams. In: Proceedings of the ACM international conference on knowledge discovery and data mining

  20. Madden SR, Franklin MJ, Hellerstein JM, Hong W (2002) TAG: a TinyAGgregation service for ad-hoc sensor networks. In: Proceedings of the symposium on operating systems design and implementation

  21. Srivastava U, Widom J (2004) Flexible time management in data stream systems. In: Proceedings of the ACM symposium on principles of database systems

  22. Srivastava U, Widom J (2004) Memory-limited execution of windowed stream joins. In: Proceedings of the international conference on very large databases

  23. Tatbul N, Cetintemel U, Zdonik S, Cherniack M, Stonebraker M (2003) Load shedding in a data stream manager. In: Proceedings of the international conference on very large data bases

  24. Tucker PA, Maier D, Sheard T, Fegaras L (2003) Exploiting punctuation semantics in continuous data streams. IEEE Trans Knowl Data Eng 15(3):555–568

    Google Scholar 

  25. Wu K-L, Chen S-K, Yu PS (2006) Query indexing with containment-encoded intervals for efficient stream processing. Knowl Inf Syst 9(1):62–90

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bugra Gedik.

Additional information

Bugra Gedik received the B.S. degree in C.S. from the Bilkent University, Ankara, Turkey, and the Ph.D. degree in C.S. from the College of Computing at the Georgia Institute of Technology, Atlanta, GA, USA. He is with the IBM Thomas J. Watson Research Center, currently a member of the Software Tools and Techniques Group. Dr. Gedik's research interests lie in data intensive distributed computing systems, spanning data-centric peer-to-peer overlay networks, mobile and sensor-based distributed data management systems, and distributed data stream processing systems. His research focus is on developing system-level architectures and techniques to address scalability problems in distributed continual query systems and applications. He is the recipient of the ICDCS 2003 best paper award. He has served in the program committees of several international conferences, such as ICDE, MDM, and CollaborateCom.

Kun-Lung Wu received the B.S. degree in E.E. from the National Taiwan University, Taipei, Taiwan, the M.S. and Ph.D. degrees in C.S. both from the University of Illinois at Urbana-Champaign. He is with the IBM Thomas J. Watson Research Center, currently a member of the Software Tools and Techniques Group. His recent research interests include data streams, continual queries, mobile computing, Internet technologies and applications, database systems and distributed computing. He has published extensively and holds many patents in these areas.

Dr. Wu is a Senior Member of the IEEE Computer Society and a member of the ACM. He is the Program Co-Chair for the IEEE Joint Conference on e-Commerce Technology (CEC 2007) and Enterprise Computing, e-Commerce and e-Services (EEE 2007). He was an Associate Editor for the IEEE Trans. on Knowledge and Data Engineering, 2000–2004. He was the general chair for the 3rd International Workshop on E-Commerce and Web-Based Information Systems (WECWIS 2001). He has served as an organizing and program committee member on various conferences. He has received various IBM awards, including IBM Corporate Environmental Affair Excellence Award, Research Division Award, and several Invention Achievement Awards. He received a best paper award from IEEE EEE 2004. He is an IBM Master Inventor.

Philip S. Yu received the B.S. Degree in E.E. from National Taiwan University, the M.S. and Ph.D. degrees in E.E. from Stanford University, and the M.B.A. degree from New York University. He is with the IBM Thomas J. Watson Research Center and currently manager of the Software Tools and Techniques group. His research interests include data mining, Internet applications and technologies, database systems, multimedia systems, parallel and distributed processing, and performance modeling. Dr. Yu has published more than 430 papers in refereed journals and conferences. He holds or has applied for more than 250 US patents.

Dr. Yu is a Fellow of the ACM and a Fellow of the IEEE. He is associate editors of ACM Transactions on the Internet Technology and ACM Transactions on Knowledge Discovery in Data. He is a member of the IEEE Data Engineering steering committee and is also on the steering committee of IEEE Conference on Data Mining. He was the Editor-in-Chief of IEEE Transactions on Knowledge and Data Engineering (2001–2004), an editor, advisory board member and also a guest co-editor of the special issue on mining of databases. He had also served as an associate editor of Knowledge and Information Systems. In addition to serving as program committee member on various conferences, he will be serving as the general chair of 2006 ACM Conference on Information and Knowledge Management and the program chair of the 2006 joint conferences of the 8th IEEE Conference on E-Commerce Technology (CEC' 06) and the 3rd IEEE Conference on Enterprise Computing, E-Commerce and E-Services (EEE' 06). He was the program chair or co-chairs of the 11th IEEE Intl. Conference on Data Engineering, the 6th Pacific Area Conference on Knowledge Discovery and Data Mining, the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, the 2nd IEEE Intl. Workshop on Research Issues on Data Engineering: Transaction and Query Processing, the PAKDD Workshop on Knowledge Discovery from Advanced Databases, and the 2nd IEEE Intl. Workshop on Advanced Issues of E-Commerce and Web-based Information Systems. He served as the general chair of the 14th IEEE Intl. Conference on Data Engineering and the general co-chair of the 2nd IEEE Intl. Conference on Data Mining. He has received several IBM honors including 2 IBM Outstanding Innovation Awards, an Outstanding Technical Achievement Award, 2 Research Division Awards and the 84th plateau of Invention Achievement Awards. He received an Outstanding Contributions Award from IEEE Intl. Conference on Data Mining in 2003 and also an IEEE Region 1 Award for “promoting and perpetuating numerous new electrical engineering concepts” in 1999. Dr. Yu is an IBM Master Inventor.

Ling Liu is an associate professor at the College of Computing at Georgia Tech. There, she directs the research programs in Distributed Data Intensive Systems Lab (DiSL), examining research issues and technical challenges in building large scale distributed computing systems that can grow without limits. Dr. Liu and the DiSL research group have been working on various aspects of distributed data intensive systems, ranging from decentralized overlay networks, exemplified by peer to peer computing, data grid computing, to mobile computing systems and location based services, sensor network computing, and enterprise computing systems. She has published over 150 international journal and conference articles. Her research group has produced a number of software systems that are either open sources or directly accessible online, among which the most popular ones are WebCQ and XWRAPElite. Dr. Liu is currently on the editorial board of several international journals, including IEEE Transactions on Knowledge and Data Engineering, International Journal of Very large Database systems (VLDBJ), International Journal of Web Services Research, and has chaired a number of conferences as a PC chair, a vice PC chair, or a general chair, including IEEE International Conference on Data Engineering (ICDE 2004, ICDE 2006, ICDE 2007), IEEE International Conference on Distributed Computing (ICDCS 2006), IEEE International Conference on Web Services (ICWS 2004). She is a recipient of IBM Faculty Award (2003, 2006). Dr. Liu's current research is partly sponsored by grants from NSF CISE CSR, ITR, CyberTrust, a grant from AFOSR, an IBM SUR grant, and an IBM faculty award.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gedik, B., Wu, KL., Yu, P.S. et al. CPU load shedding for binary stream joins. Knowl Inf Syst 13, 271–303 (2007). https://doi.org/10.1007/s10115-006-0044-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-006-0044-4

Keywords

Navigation