Skip to main content

A State-Size Inclusive Approach to Optimizing Stream Processing Applications

  • Conference paper
  • First Online:
Computer Performance Engineering and Stochastic Modelling (EPEW 2023, ASMTA 2023)

Abstract

In stream processing applications, accurately measuring a system’s processing capacity is critical for ensuring optimal performance and meeting Service Level Objectives (SLOs). Traditionally, operator throughput has been used as a proxy for the application’s state size, but this approach can be misleading when dealing with window-based applications. In this paper, we explore the impact of window selectivity on the performance of streaming applications, demonstrating how a growing application state can artificially decrease the operators’ throughput, resulting in false positives that could trigger premature scaling-down decisions. To address this problem, we conduct empirical evaluations to assess the relationship between operators’ throughput and state size, showcasing the state size pattern typically does not correspond to the operator’s processing rate in window-based applications. Our findings highlight the importance of considering the state size of the application in performance monitoring and decision-making, particularly in the context of window-based applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Asyabi, E., Wang, Y., Liagouris, J., Kalavri, V., Bestavros, A.: A new benchmark harness for systematic and robust evaluation of streaming state stores. In: Proceedings of the Seventeenth European Conference on Computer Systems, pp. 559–574 (2022)

    Google Scholar 

  2. Carbone, P., Ewen, S., Fóra, G., Haridi, S., Richter, S., Tzoumas, K.: State management in apache flink®: consistent stateful distributed stream processing. Proc. VLDB Endowment 10(12), 1718–1729 (2017)

    Article  Google Scholar 

  3. Cardellini, V., Presti, F.L., Nardelli, M., Russo, G.R.: Decentralized self-adaptation for elastic data stream processing. Futur. Gener. Comput. Syst. 87, 171–185 (2018)

    Article  Google Scholar 

  4. Cattermole, A., Forshaw, M.: An automated approach to cloud performance benchmarking. Electron. Notes in Theor. Comput. Sci. 340, 23–39 (2018)

    Article  Google Scholar 

  5. Cengiz, M., Forshaw, M., Atapour-Abarghouei, A., McGough, A.S.: Predicting the performance of a computing system with deep networks. In: Proceedings of the 2023 ACM/SPEC International Conference on Performance Engineering, pp. 91–98 (2023)

    Google Scholar 

  6. Ezhilchelvan, P., Mitrani, I.: Checkpointing models for tasks with widely different processing times. In: Gilly, K., Thomas, N. (eds.) Computer Performance Engineering: 18th European Workshop, EPEW 2022, Santa Pola, Spain, September 21–23, 2022, Proceedings, pp. 100–114. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-25049-1_7

    Chapter  Google Scholar 

  7. Floratou, A., Agrawal, A., Graham, B., Rao, S., Ramasamy, K.: Dhalion: self-regulating stream processing in heron. Proc. VLDB Endowment 10(12), 1825–1836 (2017)

    Article  Google Scholar 

  8. Gou, X., et al.: Sliding sketches: a framework using time zones for data stream processing in sliding windows. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1015–1025 (2020)

    Google Scholar 

  9. Hueske, F., Kalavri, V.: Stream processing with Apache Flink: fundamentals, implementation, and operation of streaming applications. O’Reilly Media (2019)

    Google Scholar 

  10. Jamieson, S.: Dynamic scaling of distributed data-flows under uncertainty. In: Proceedings of the 14th ACM International Conference on Distributed and Event-based Systems, pp. 230–233 (2020)

    Google Scholar 

  11. Jamieson, S., Forshaw, M.: Measuring streaming system robustness using non-parametric goodness-of-fit tests. In: Gilly, K., Thomas, N. (eds.) Computer Performance Engineering: 18th European Workshop, EPEW 2022, Santa Pola, Spain, September 21–23, 2022, Proceedings, pp. 3–18. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-25049-1_1

    Chapter  Google Scholar 

  12. Kalavri, V., Liagouris, J., Hoffmann, M., Dimitrova, D., Forshaw, M., Roscoe, T.: Three steps is all you need: fast, accurate, automatic scaling decisions for distributed streaming dataflows. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2018), pp. 783–798 (2018)

    Google Scholar 

  13. Li, S., Gerver, P., MacMillan, J., Debrunner, D., Marshall, W., Wu, K.L.: Challenges and experiences in building an efficient apache beam runner for IBM streams. Proc. VLDB Endowment 11(12), 1742–1754 (2018)

    Article  Google Scholar 

  14. Li, T., Xu, Z., Tang, J., Wang, Y.: Model-free control for distributed stream data processing using deep reinforcement learning. arXiv preprint arXiv:1803.01016 (2018)

  15. Mohamed, S., Forshaw, M., Thomas, N., Dinn, A.: Performance and dependability evaluation of distributed event-based systems: a dynamic code-injection approach. In: Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering, pp. 349–352 (2017)

    Google Scholar 

  16. Omoregbee, P., Forshaw, M.: Performability requirements in making a rescaling decision for streaming applications. In: Gilly, K., Thomas, N. (eds.) Computer Performance Engineering: 18th European Workshop, EPEW 2022, Santa Pola, Spain, September 21–23, 2022, Proceedings, pp. 133–147. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-25049-1_9

    Chapter  Google Scholar 

  17. Röger, H., Mayer, R.: A comprehensive survey on parallelization and elasticity in stream processing. ACM Comput. Surv. (CSUR) 52(2), 1–37 (2019)

    Article  Google Scholar 

  18. Rzadca, K., et al.: Autopilot: workload autoscaling at google. In: Proceedings of the Fifteenth European Conference on Computer Systems, pp. 1–16 (2020)

    Google Scholar 

  19. da Silva Veith, A., de Assunçao, M.D., Lefevre, L.: Monte-carlo tree search and reinforcement learning for reconfiguring data stream processing on edge computing. In: 2019 31st International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 48–55. IEEE (2019)

    Google Scholar 

  20. Tangwongsan, K., Hirzel, M., Schneider, S.: Sliding-window aggregation algorithms (2019)

    Google Scholar 

  21. Tucker, P., Tufte, K., Papadimos, V., Maier, D.: Nexmark-a benchmark for queries over data streams (draft). Tech. rep., Technical report, OGI School of Science and Engineering at (2008)

    Google Scholar 

  22. Van Dongen, G., Van Den Poel, D.: Influencing factors in the scalability of distributed stream processing jobs. IEEE Access 9, 109413–109431 (2021). https://doi.org/10.1109/ACCESS.2021.3102645

    Article  Google Scholar 

  23. Vogel, A., Griebler, D., Danelutto, M., Fernandes, L.G.: Self-adaptation on parallel stream processing: a systematic review. Concurrency Comput.: Pract. Experience 34(6), e6759 (2022)

    Article  Google Scholar 

  24. Zhang, Z., Li, W., Qing, X., Liu, X., Liu, H.: Research on optimal checkpointing-interval for flink stream processing applications. Mobile Networks Appl. 26(5), 1950–1959 (2021). https://doi.org/10.1007/s11036-020-01729-7

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paul Omoregbee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Omoregbee, P., Forshaw, M., Thomas, N. (2023). A State-Size Inclusive Approach to Optimizing Stream Processing Applications. In: Iacono, M., Scarpa, M., Barbierato, E., Serrano, S., Cerotti, D., Longo, F. (eds) Computer Performance Engineering and Stochastic Modelling. EPEW ASMTA 2023 2023. Lecture Notes in Computer Science, vol 14231. Springer, Cham. https://doi.org/10.1007/978-3-031-43185-2_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43185-2_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43184-5

  • Online ISBN: 978-3-031-43185-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics