Abstract
In stream processing applications, accurately measuring a system’s processing capacity is critical for ensuring optimal performance and meeting Service Level Objectives (SLOs). Traditionally, operator throughput has been used as a proxy for the application’s state size, but this approach can be misleading when dealing with window-based applications. In this paper, we explore the impact of window selectivity on the performance of streaming applications, demonstrating how a growing application state can artificially decrease the operators’ throughput, resulting in false positives that could trigger premature scaling-down decisions. To address this problem, we conduct empirical evaluations to assess the relationship between operators’ throughput and state size, showcasing the state size pattern typically does not correspond to the operator’s processing rate in window-based applications. Our findings highlight the importance of considering the state size of the application in performance monitoring and decision-making, particularly in the context of window-based applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Asyabi, E., Wang, Y., Liagouris, J., Kalavri, V., Bestavros, A.: A new benchmark harness for systematic and robust evaluation of streaming state stores. In: Proceedings of the Seventeenth European Conference on Computer Systems, pp. 559–574 (2022)
Carbone, P., Ewen, S., Fóra, G., Haridi, S., Richter, S., Tzoumas, K.: State management in apache flink®: consistent stateful distributed stream processing. Proc. VLDB Endowment 10(12), 1718–1729 (2017)
Cardellini, V., Presti, F.L., Nardelli, M., Russo, G.R.: Decentralized self-adaptation for elastic data stream processing. Futur. Gener. Comput. Syst. 87, 171–185 (2018)
Cattermole, A., Forshaw, M.: An automated approach to cloud performance benchmarking. Electron. Notes in Theor. Comput. Sci. 340, 23–39 (2018)
Cengiz, M., Forshaw, M., Atapour-Abarghouei, A., McGough, A.S.: Predicting the performance of a computing system with deep networks. In: Proceedings of the 2023 ACM/SPEC International Conference on Performance Engineering, pp. 91–98 (2023)
Ezhilchelvan, P., Mitrani, I.: Checkpointing models for tasks with widely different processing times. In: Gilly, K., Thomas, N. (eds.) Computer Performance Engineering: 18th European Workshop, EPEW 2022, Santa Pola, Spain, September 21–23, 2022, Proceedings, pp. 100–114. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-25049-1_7
Floratou, A., Agrawal, A., Graham, B., Rao, S., Ramasamy, K.: Dhalion: self-regulating stream processing in heron. Proc. VLDB Endowment 10(12), 1825–1836 (2017)
Gou, X., et al.: Sliding sketches: a framework using time zones for data stream processing in sliding windows. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1015–1025 (2020)
Hueske, F., Kalavri, V.: Stream processing with Apache Flink: fundamentals, implementation, and operation of streaming applications. O’Reilly Media (2019)
Jamieson, S.: Dynamic scaling of distributed data-flows under uncertainty. In: Proceedings of the 14th ACM International Conference on Distributed and Event-based Systems, pp. 230–233 (2020)
Jamieson, S., Forshaw, M.: Measuring streaming system robustness using non-parametric goodness-of-fit tests. In: Gilly, K., Thomas, N. (eds.) Computer Performance Engineering: 18th European Workshop, EPEW 2022, Santa Pola, Spain, September 21–23, 2022, Proceedings, pp. 3–18. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-25049-1_1
Kalavri, V., Liagouris, J., Hoffmann, M., Dimitrova, D., Forshaw, M., Roscoe, T.: Three steps is all you need: fast, accurate, automatic scaling decisions for distributed streaming dataflows. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2018), pp. 783–798 (2018)
Li, S., Gerver, P., MacMillan, J., Debrunner, D., Marshall, W., Wu, K.L.: Challenges and experiences in building an efficient apache beam runner for IBM streams. Proc. VLDB Endowment 11(12), 1742–1754 (2018)
Li, T., Xu, Z., Tang, J., Wang, Y.: Model-free control for distributed stream data processing using deep reinforcement learning. arXiv preprint arXiv:1803.01016 (2018)
Mohamed, S., Forshaw, M., Thomas, N., Dinn, A.: Performance and dependability evaluation of distributed event-based systems: a dynamic code-injection approach. In: Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering, pp. 349–352 (2017)
Omoregbee, P., Forshaw, M.: Performability requirements in making a rescaling decision for streaming applications. In: Gilly, K., Thomas, N. (eds.) Computer Performance Engineering: 18th European Workshop, EPEW 2022, Santa Pola, Spain, September 21–23, 2022, Proceedings, pp. 133–147. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-25049-1_9
Röger, H., Mayer, R.: A comprehensive survey on parallelization and elasticity in stream processing. ACM Comput. Surv. (CSUR) 52(2), 1–37 (2019)
Rzadca, K., et al.: Autopilot: workload autoscaling at google. In: Proceedings of the Fifteenth European Conference on Computer Systems, pp. 1–16 (2020)
da Silva Veith, A., de Assunçao, M.D., Lefevre, L.: Monte-carlo tree search and reinforcement learning for reconfiguring data stream processing on edge computing. In: 2019 31st International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 48–55. IEEE (2019)
Tangwongsan, K., Hirzel, M., Schneider, S.: Sliding-window aggregation algorithms (2019)
Tucker, P., Tufte, K., Papadimos, V., Maier, D.: Nexmark-a benchmark for queries over data streams (draft). Tech. rep., Technical report, OGI School of Science and Engineering at (2008)
Van Dongen, G., Van Den Poel, D.: Influencing factors in the scalability of distributed stream processing jobs. IEEE Access 9, 109413–109431 (2021). https://doi.org/10.1109/ACCESS.2021.3102645
Vogel, A., Griebler, D., Danelutto, M., Fernandes, L.G.: Self-adaptation on parallel stream processing: a systematic review. Concurrency Comput.: Pract. Experience 34(6), e6759 (2022)
Zhang, Z., Li, W., Qing, X., Liu, X., Liu, H.: Research on optimal checkpointing-interval for flink stream processing applications. Mobile Networks Appl. 26(5), 1950–1959 (2021). https://doi.org/10.1007/s11036-020-01729-7
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Omoregbee, P., Forshaw, M., Thomas, N. (2023). A State-Size Inclusive Approach to Optimizing Stream Processing Applications. In: Iacono, M., Scarpa, M., Barbierato, E., Serrano, S., Cerotti, D., Longo, F. (eds) Computer Performance Engineering and Stochastic Modelling. EPEW ASMTA 2023 2023. Lecture Notes in Computer Science, vol 14231. Springer, Cham. https://doi.org/10.1007/978-3-031-43185-2_22
Download citation
DOI: https://doi.org/10.1007/978-3-031-43185-2_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43184-5
Online ISBN: 978-3-031-43185-2
eBook Packages: Computer ScienceComputer Science (R0)