Adaptive Partitioning and Order-Preserved Merging of Data Streams

Pohl, Constantin; Sattler, Kai-Uwe

doi:10.1007/978-3-030-28730-6_17

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11695))

Included in the following conference series:

European Conference on Advances in Databases and Information Systems

771 Accesses

Abstract

Partitioning is a key concept for utilizing modern hardware, especially to exploit parallelism opportunities from many-core CPUs. In data streaming scenarios where parameters like tuple arrival rates can vary, adaptive strategies for partitioning solve the problem of overestimating or underestimating query workloads. While there are many possibilities to partition the data flow, threads running partitions independently from each other lead to unordered output inevitably. This is a considerable difficulty for applications where tuple order matters, like in stream reasoning or complex event processing scenarios.

In this paper, we address this problem by combining an adaptive partitioning approach with an order-preserving merge algorithm. Since reordering output tuples can only worsen latency, we mainly focus on the throughput of queries while keeping the delay on individual tuples minimal. We run micro-benchmarks as well as the Linear Road benchmark, demonstrating correctness and effectiveness of our approach while scaling out on a single Xeon Phi many-core CPU up to 256 partitions.

C. Pohl—This work was partially funded by the German Research Foundation (DFG) within the SPP2037 under grant no. SA 782/28.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Open Source, https://github.com/dbis-ilm/pipefabric.
2.
www.cs.brandeis.edu/~linearroad/datadriverinstall.html.

References

Arasu, A., et al.: Linear road: a stream data management benchmark. In: (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases, Toronto, Canada, pp. 480–491, 31 August–3 September 2004
Google Scholar
Carbone, P., et al.: Apache Flink™: stream and batch processing in a single engine. IEEE Data Eng. Bull. 38(4), 28–38 (2015)
Google Scholar
Gedik, B.: Partitioning functions for stateful data parallelism in stream processing. VLDB J. 23(4), 517–539 (2014). https://doi.org/10.1007/s00778-013-0335-9
Article Google Scholar
Gedik, B., et al.: Elastic scaling for data stream processing. IEEE Trans. Parallel Distrib. Syst. 25(6), 1447–1463 (2014). https://doi.org/10.1109/TPDS.2013.295
Article Google Scholar
Gulisano, V., et al.: StreamCloud: a large scale data streaming system. In: 2010 International Conference on Distributed Computing Systems, ICDCS 2010, Genova, Italy, pp. 126–137, 21–25 June 2010. https://doi.org/10.1109/ICDCS.2010.72
Katsipoulakis, N.R., et al.: A holistic view of stream partitioning costs. PVLDB 10(11), 1286–1297 (2017). https://doi.org/10.14778/3137628.3137639
Article Google Scholar
Kulkarni, S., et al.: Twitter heron: stream processing at scale. In: SIGMOD, Melbourne, Victoria, Australia, pp. 239–250, 31 May–4 June 2015. https://doi.org/10.1145/2723372.2742788
Li, M., et al.: Event stream processing with out-of-order data arrival. In: ICDCS Workshops, Toronto, Ontario, Canada, p. 67, 25–29 June 2007. https://doi.org/10.1109/ICDCSW.2007.35
Matteis, T.D., et al.: Keep calm and react with foresight: strategies for low-latency and energy-efficient elastic data stream processing. In: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2016, Barcelona, Spain, pp. 13:1–13:12, 12–16 March 2016. https://doi.org/10.1145/2851141.2851148
Nasir, M.A.U., et al.: The power of both choices: practical load balancing for distributed stream processing engines. In: 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, pp. 137–148, 3–17 April 2015. https://doi.org/10.1109/ICDE.2015.7113279
Nasir, M.A.U., et al.: When two choices are not enough: balancing at scale in distributed stream processing. In: 32nd IEEE International Conference on Data Engineering, ICDE 2016, Helsinki, Finland, pp. 589–600, 16–20 May 2016. https://doi.org/10.1109/ICDE.2016.7498273
Pacaci, A., et al.: Distribution-aware stream partitioning for distributed stream processing systems. In: Proceedings of the 5th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond, BeyondMR@SIGMOD 2018, Houston, TX, USA, pp. 6:1–6:10, 15 June 2018. https://doi.org/10.1145/3206333.3206338
Rivetti, N., et al.: Efficient key grouping for near-optimal load balancing in stream processing systems. In: Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems, DEBS 2015, Oslo, Norway, pp. 80–91, 29 June–3 July 2015. https://doi.org/10.1145/2675743.2771827
Russo, G.R., et al.: Multi-level elasticity for wide-area data streaming systems: a reinforcement learning approach. Algorithms 11(9), 134 (2018). https://doi.org/10.3390/a11090134
Article MATH Google Scholar
Shah, M.A., et al.: Flux: an adaptive partitioning operator for continuous query systems. In: Proceedings of the 19th International Conference on Data Engineering, Bangalore, India, pp. 25–36, 5–8 March 2003. https://doi.org/10.1109/ICDE.2003.1260779
Toshniwal, A., et al.: Storm @Twitter. In: SIGMOD, Snowbird, UT, USA, pp. 147–156, 22–27 June 2014. https://doi.org/10.1145/2588555.2595641
Zeitler, E., et al.: Massive scale-out of expensive continuous queries. PVLDB 4(11), 1181–1188 (2011)
Google Scholar
Zhu, Y., et al.: Dynamic plan migration for continuous queries over data streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Paris, France, pp. 431–442, 13–18 June 2004. https://doi.org/10.1145/1007568.1007617

Download references

Author information

Authors and Affiliations

Databases and Information Systems Group, TU Ilmenau, Ilmenau, Germany
Constantin Pohl & Kai-Uwe Sattler

Authors

Constantin Pohl
View author publications
You can also search for this author in PubMed Google Scholar
Kai-Uwe Sattler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Constantin Pohl .

Editor information

Editors and Affiliations

University of Maribor, Maribor, Slovenia
Tatjana Welzer
Alpen-Adria Universität Klagenfurt, Klagenfurt, Austria
Johann Eder
University of Maribor, Maribor, Slovenia
Vili Podgorelec
University of Maribor, Maribor, Slovenia
Aida Kamišalić Latifić

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pohl, C., Sattler, KU. (2019). Adaptive Partitioning and Order-Preserved Merging of Data Streams. In: Welzer, T., Eder, J., Podgorelec, V., Kamišalić Latifić, A. (eds) Advances in Databases and Information Systems. ADBIS 2019. Lecture Notes in Computer Science(), vol 11695. Springer, Cham. https://doi.org/10.1007/978-3-030-28730-6_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-28730-6_17
Published: 13 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28729-0
Online ISBN: 978-3-030-28730-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics