Abstract
We introduce an elastic queue middleware (EQM) in a distributed streaming processing architecture to handle drastically growing input streams at peak times and maintain resource utilization at off-peak times. EQM serves as a scalable stream buffer to solve bottlenecks of stream processing on the fly. With spikes in data rates, the stream buffer which holds the input tuples for a bottleneck operator scales out in EQM to immediately alleviate back pressure and the streaming engines can thus gradually deploy additional replicas of the bottleneck operator to cope with the increasing data rates. This differs from general elastic streaming processing where bottleneck operators scale out first and then the stream buffers are allocated. To implement a scalable buffer, EQM utilizes existing scalable data stores (e.g. HBase) to avoid re-inventing the same elasticity and scalability logic and meanwhile ensures load balancing performance. Experiment results show that stable throughput is achieved at varying data rates using EQM.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Instead of buffer, we want to achieve “write once; read once” queue feature with enqueue and dequeue operation support in this context.
References
Zaharia, M., Das, T., et al.: Discretized streams: fault-tolerant streaming computation at scale. In: SOSP (2013)
Carbone, P., Katsifodimos, A., et al.: Apache flink: stream and batch processing in a single engine. In: Data Engineering (2015)
Trident Tutorial. http://storm.apache.org/documentation/Trident-tutorial.html
Meehan, J., Tatbul, N., et al.: S-store: streaming meets transaction processing. In: VLDB (2015)
Meehan, J., Aslantas, C., et al.: Data ingestion for the connected world. In: CIDR (2017)
Schneider, S., Andrade, H., et al.: Elastic scaling of data parallel operators in stream processing. In: IPDPS (2009)
Gedik, B., Schneider, S., et al.: Elastic scaling for data stream processing. IEEE TPDS 25(6), 1447–1463 (2014)
Fernandez, R.C., Migliavacca, M., et al.: Integrating scale out and fault tolerance in stream processing using operator state management. In: SIGMOD (2013)
Wu, Y., Tan, K.L.: ChronoStream: Elastic stateful stream computation in the cloud. In: ICDE (2015)
Karakasidis, A., Vassiliadis, P., et al.: ETL queues for active data warehousing. In: IQIS (2005)
Cattell, R.: Scalable SQL and NoSQL data stores. SIGMOD 39(4), 12–27 (2011)
Chang, F., Dean, J., et al.: Bigtable: a distributed storage system for structured data. ACM TOCS 26(2) (2008)
Qu, W., Basavaraj, V., Shankar, S., Dessloch, S.: Real-time snapshot maintenance with incremental ETL pipelines in data warehouses. In: Madria, S., Hara, T. (eds.) DaWaK 2015. LNCS, vol. 9263, pp. 217–228. Springer, Cham (2015). doi:10.1007/978-3-319-22729-0_17
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Qu, W., Dessloch, S. (2017). A Lightweight Elastic Queue Middleware for Distributed Streaming Pipeline. In: Bellatreche, L., Chakravarthy, S. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2017. Lecture Notes in Computer Science(), vol 10440. Springer, Cham. https://doi.org/10.1007/978-3-319-64283-3_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-64283-3_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64282-6
Online ISBN: 978-3-319-64283-3
eBook Packages: Computer ScienceComputer Science (R0)