skip to main content
10.1145/3062341.3062366acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Low-synchronization, mostly lock-free, elastic scheduling for streaming runtimes

Published: 14 June 2017 Publication History

Abstract

We present the scalable, elastic operator scheduler in IBM Streams 4.2. Streams is a distributed stream processing system used in production at many companies in a wide range of industries. The programming language for Streams, SPL, presents operators, tuples and streams as the primary abstractions. A fundamental SPL optimization is operator fusion, where multiple operators execute in the same process. Streams 4.2 introduces automatic submission-time fusion to simplify application development and deployment. However, potentially thousands of operators could then execute in the same process, with no user guidance for thread placement. We needed a way to automatically figure out how many threads to use, with arbitrarily sized applications on a wide variety of hardware, and without any input from programmers. Our solution has two components. The first is a scalable operator scheduler that minimizes synchronization, locks and global data, while allowing threads to execute any operator and dynamically come and go. The second is an elastic algorithm to dynamically adjust the number of threads to optimize performance, using the principles of trusted measurements to establish trends. We demonstrate our scheduler's ability to scale to over a hundred threads, and our elasticity algorithm's ability to adapt to different workloads on an Intel Xeon system with 176 logical cores, and an IBM Power8 system with 184 logical cores.

References

[1]
Streaming Analytics. https://console.ng.bluemix.net/catalog/ services/streaming-analytics. Retrieved April, 2017.
[2]
Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. Cilk: An efficient multithreaded runtime system. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP ’95, New York, NY, USA, 1995. ACM.
[3]
Boost.Lockfree. http://www.boost.org/doc/libs/1_63_0/doc/ html/lockfree.html. Retrieved March, 2017.
[4]
C++ std::atomic. http://en.cppreference.com/w/cpp/atomic/ atomic. Retrieved March, 2017.
[5]
Apache Flink. http://flink.apache.org. Retrieved March, 2017.
[6]
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The Implementation of the Cilk-5 Multithreaded Language. In Programming Language Design and Implementation (PLDI), 1998.
[7]
Bugra Gedik, Scott Schneider, Martin Hirzel, and Kun-Lung Wu. Elastic scaling for data stream processing. IEEE Transactions on Parallel and Distributed Systems (TPDS), 2014.
[8]
IBM Streams Demo. https://github.com/IBMStreams/streamsx. demo.logwatch. Retrieved March, 2017.
[9]
IBM Streams Samples. https://github.com/IBMStreams/samples. Retrieved March, 2017.
[10]
Heron. https://twitter.github.io/heron. Retrieved March, 2017.
[11]
Martin Hirzel, Henrique Andrade, Bu˘gra Gedik, Gabriela Jacques-Silva, Rohit Khandekar, Vibhore Kumar, Mark Mendell, Howard Nasgaard, Scott Schneider, Robert Soulé, and Kun-Lung Wu. IBM Streams Processing Language: Analyzing big data in motion. IBM Journal of Research and Development, 57(3/4), 2013.
[12]
Martin Hirzel, Scott Schneider, and Bu˘gra Gedik. SPL: An extensible language for distributed stream processing. ACM Transactions on Programming Languages and Systems (TOPLAS), 39(1), March 2017.
[13]
IBM Stream Computing. http://www.ibm.com/analytics/us/en/ technology/stream-computing. Retrieved March, 2017.
[14]
Gabriela Jacques-Silva, Fang Zheng, Daniel Debrunner, Kun-Lung Wu, Victor Dogaru, Eric Johnson, Michael Spicer, and Ahmet Erdem Sariyuce. Consistent Regions: Guaranteed Tuple Processing in IBM Streams. In Very Large Data Bases Conference (VLDB), 2016.
[15]
Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. Naiad: A timely dataflow system. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP ’13, New York, NY, USA, 2013. ACM.
[16]
OpenMP. http://openmp.org/. Retrieved March, 2017.
[17]
Semih Sahin. C-stream: A coroutine-based elastic stream processing engine. Master’s thesis, Bilkent University, June 2015.
[18]
Scott Schneider. The ElasticLoadBalance Operator. https://developer.ibm.com/streamsdev/2015/01/27/ elasticloadbalance-operator, 2015.
[19]
Scott Schneider, Henrique Andrade, Bugra Gedik, Alain Biem, and Kun-Lung Wu. Elastic scaling of data parallel operators in stream processing. In IEEE International Parallel and Distributed Processing Symposium, 2009.
[20]
Scott Schneider, Bugra Gedik, and Martin Hirzel. Language runtime and optimizations in IBM Streams. IEEE Database Engineering Bulletin, 38(4), 2015.
[21]
SPL Reference. http://www.ibm.com/support/knowledgecenter/ SSCRJU_4.2.0/com.ibm.streams.ref.doc/doc/spl-container. html. Retrieved March, 2017.
[22]
Apache Storm. http://storm.apache.org. Retrieved March, 2017.
[23]
StreamsDev: IBM Streams Developer Community. https:// developer.ibm.com/streamsdev. Retrieved March, 2017.
[24]
Yuzhe Tang and Bugra Gedik. Auto-pipelining for data stream processing. IEEE Transactions on Parallel and Distributed Systems (TPDS), 24(11), 2013.
[25]
Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, and Dmitriy Ryaboy. Storm@twitter. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD ’14, New York, NY, USA, 2014. ACM.

Cited By

View all
  • (2021)WindFlow: High-Speed Continuous Stream Processing With Parallel Building BlocksIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.307397032:11(2748-2763)Online publication date: 1-Nov-2021
  • (2020)Resource Management and Scheduling in Distributed Stream Processing SystemsACM Computing Surveys10.1145/335539953:3(1-41)Online publication date: 28-May-2020
  • (2019)Automating Multi-level Performance Elastic Components for IBM StreamsProceedings of the 20th International Middleware Conference10.1145/3361525.3361544(163-175)Online publication date: 9-Dec-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2017
708 pages
ISBN:9781450349888
DOI:10.1145/3062341
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. elastic
  2. lock-free
  3. runtime scheduling
  4. stream processing

Qualifiers

  • Research-article

Conference

PLDI '17
Sponsor:

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)1
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2021)WindFlow: High-Speed Continuous Stream Processing With Parallel Building BlocksIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.307397032:11(2748-2763)Online publication date: 1-Nov-2021
  • (2020)Resource Management and Scheduling in Distributed Stream Processing SystemsACM Computing Surveys10.1145/335539953:3(1-41)Online publication date: 28-May-2020
  • (2019)Automating Multi-level Performance Elastic Components for IBM StreamsProceedings of the 20th International Middleware Conference10.1145/3361525.3361544(163-175)Online publication date: 9-Dec-2019
  • (2019)Scaling Ordered Stream Processing on Shared-Memory MulticoresProceedings of Real-Time Business Intelligence and Analytics10.1145/3350489.3350495(1-10)Online publication date: 26-Aug-2019
  • (2019)Efficient and Scalable Execution of Fine-Grained Dynamic Linear PipelinesACM Transactions on Architecture and Code Optimization10.1145/330741116:2(1-26)Online publication date: 18-Apr-2019
  • (2019)Automated multi-dimensional elasticity for streaming runtimesProceedings of the 24th Symposium on Principles and Practice of Parallel Programming10.1145/3293883.3301492(427-428)Online publication date: 16-Feb-2019
  • (2018)Challenges and experiences in building an efficient apache beam runner for IBM streamsProceedings of the VLDB Endowment10.14778/3229863.322986411:12(1742-1754)Online publication date: 1-Aug-2018
  • (2023)PCGC: a performance compact graph compiler based on multilevel fusion-splitting rulesThe Journal of Supercomputing10.1007/s11227-023-05298-w79:15(17419-17444)Online publication date: 7-May-2023
  • (2021)Self‐adaptation on parallel stream processing: A systematic reviewConcurrency and Computation: Practice and Experience10.1002/cpe.675934:6Online publication date: 7-Dec-2021
  • (2019)EdgewiseProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358887(929-945)Online publication date: 10-Jul-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media