Abstract
Distributed Stream Processing (DSP) systems have recently attracted much attention because of their ability to process huge volumes of real-time stream data with very low latency on clusters of commodity hardware. Existing workload grouping strategies in a DSP system can be classified into four categories (i.e. raw and blind, data skewness, cluster heterogeneity, and dynamic load-aware). However, these traditional stream grouping strategies do not consider network distance between two communicating operators. In fact, the traffic from different network channels makes a significant impact on performance. How to grouping tuples according to network distances to improve performance has been a critical problem.
In this paper, we propose a network-aware grouping framework called Squirrel to improve the performance under different network distances. Identifying the network location of two communicating operators, Squirrel sets a weight and priority for each network channel. It introduces Weight Grouping to assign different numbers of tuples to each network channel according to channel’s weight and priority. In order to adapt to changes in network conditions, input load, resources and other factors, Squirrel uses Dynamic Weight Control to adjust network channel’s weight and priority online by analyzing runtime information. Experimental results prove Squirrel’s effectiveness and show that Squirrel can achieve 1.67x improvement in terms of throughput and reduce the latency by 47%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Apache flink. http://flink.apache.org/
Aniello, L., Baldoni, R., Querzoni, L.: Adaptive online scheduling in storm. In: Proceedings of DEBS 2013, pp. 207–218 (2013)
Caneill, M., EI Rheddane, A., Leroy, V., De Palma, N.: Locality-aware routing in stateful streaming applications. In: Proceedings of Middleware 2016, pp. 1–13 (2016)
Carbone, P., Ewen, S., Haridi, S.: Apache flink: stream and batch processing in a single engine. Bull. IEEE Comput. Soc. Tech. Comm. Data Eng. 36(4), 28–38 (2015)
Chen, H., Zhang, F., Jin, H.: Popularity-aware differentiated distributed stream processing on skewed streams. In: Proceedings of ICNP 2017, pp. 1–10 (2017)
Chintapalli, S., et al.: Benchmarking streaming computation engines: storm, flink and spark streaming. In: Proceedings of IPDPSW 2016, pp. 1789–1792 (2016)
Fang, J., Zhang, R., Fu, T., Zhang, Z., Zhou, A., Zhu, J.: Parallel stream processing against workload skewness and variance. In: Proceedings of HPDC 2017, pp. 15–26 (2017)
Kulkarni, S., et al.: Twitter heron: stream processing at scale. In: Proceedings of SIGMOD 2015, pp. 239–250 (2015)
Murray, D., McSherry, F., Isaacs, R., Isard, M., Barham, P., Abadi, M.: Naiad: a timely dataflow system. In: Proceedings of SOSP 2013, pp. 439–455 (2013)
Nasir, M.A.U., et al.: Load balancing for skewed streams on heterogeneous clusters. CoRR abs/1705.09073 (2017). http://arxiv.org/abs/1705.09073
Nasir, M.A.U., Morales, G.D.F., Garcia-Soriano, D., Kourtellis, N., Serafini, M.: The power of both choices: practical load balancing for distributed stream processing engines. In: Proceedings of ICDE 2015, pp. 137–148 (2015)
Nasir, M.A.U., Morales, G.D.F., Garcia-Soriano, D., Kourtellis, N., Serafini,M.: Partial key grouping: load-balanced partitioning of distributed streams. CoRR abs/1510.07623 (2015). http://arxiv.org/abs/1510.07623
Nasir, M.A.U., Morales, G.D.F., Kourtellis, N., Serafini, M.: When two choices are not enough: balancing at scale in distributed stream processing. In: Proceedings of ICDE 2016, pp. 589–600 (2016)
Peng, B., Hosseini, M., Hong, Z., Farivar, R., Campbell, R.: R-storm: resource-aware scheduling in storm. In: Proceedings of Middleware 2015, pp. 149–161 (2015)
Rivetti, N., Anceaume, E., Busnel, Y., Querzoni, L., Sericola, B.: Online scheduling for shuffle grouping in distributed stream processing systems. In: Proceedings of Middleware 2016, pp. 11–22 (2016)
Rivetti, N., Querzoni, L., Anceaume, E., Busnel, Y., Sericola, B.: Efficient key grouping for near-optimal load balancing in stream processing systems. In: Proceedings of DEBS 2015, pp. 80–91 (2015)
Schneider, S., Wolf, J., Hildrum, K., Khandekar, R.: Dynamic load balancing for ordered data-parallel regions in distributed streaming systems. In: Proceedings of Middleware 2016, pp. 21–34 (2016)
Toshniwal, A., et al.: Storm @twitter. In: Proceedings of SIGMOD 2014, pp. 147–156 (2014)
Xu, J., Chen, Z., Tang, J., Su, S.: T-storm: traffic-aware online scheduling in storm. In: Proceedings of ICDCS 2014, pp. 535–544 (2014)
Acknowledgment
This work was supported by National Key Research and Development Program under grant 2018YFB1003600 and Pre-research Project of Beifang under grant FFZ-1601.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, F., Wu, S., Jin, H. (2018). Network-Aware Grouping in Distributed Stream Processing Systems. In: Vaidya, J., Li, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2018. Lecture Notes in Computer Science(), vol 11334. Springer, Cham. https://doi.org/10.1007/978-3-030-05051-1_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-05051-1_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05050-4
Online ISBN: 978-3-030-05051-1
eBook Packages: Computer ScienceComputer Science (R0)