Skip to main content

Network-Aware Grouping in Distributed Stream Processing Systems

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11334))

Abstract

Distributed Stream Processing (DSP) systems have recently attracted much attention because of their ability to process huge volumes of real-time stream data with very low latency on clusters of commodity hardware. Existing workload grouping strategies in a DSP system can be classified into four categories (i.e. raw and blind, data skewness, cluster heterogeneity, and dynamic load-aware). However, these traditional stream grouping strategies do not consider network distance between two communicating operators. In fact, the traffic from different network channels makes a significant impact on performance. How to grouping tuples according to network distances to improve performance has been a critical problem.

In this paper, we propose a network-aware grouping framework called Squirrel to improve the performance under different network distances. Identifying the network location of two communicating operators, Squirrel sets a weight and priority for each network channel. It introduces Weight Grouping to assign different numbers of tuples to each network channel according to channel’s weight and priority. In order to adapt to changes in network conditions, input load, resources and other factors, Squirrel uses Dynamic Weight Control to adjust network channel’s weight and priority online by analyzing runtime information. Experimental results prove Squirrel’s effectiveness and show that Squirrel can achieve 1.67x improvement in terms of throughput and reduce the latency by 47%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Apache flink. http://flink.apache.org/

  2. Aniello, L., Baldoni, R., Querzoni, L.: Adaptive online scheduling in storm. In: Proceedings of DEBS 2013, pp. 207–218 (2013)

    Google Scholar 

  3. Caneill, M., EI Rheddane, A., Leroy, V., De Palma, N.: Locality-aware routing in stateful streaming applications. In: Proceedings of Middleware 2016, pp. 1–13 (2016)

    Google Scholar 

  4. Carbone, P., Ewen, S., Haridi, S.: Apache flink: stream and batch processing in a single engine. Bull. IEEE Comput. Soc. Tech. Comm. Data Eng. 36(4), 28–38 (2015)

    Google Scholar 

  5. Chen, H., Zhang, F., Jin, H.: Popularity-aware differentiated distributed stream processing on skewed streams. In: Proceedings of ICNP 2017, pp. 1–10 (2017)

    Google Scholar 

  6. Chintapalli, S., et al.: Benchmarking streaming computation engines: storm, flink and spark streaming. In: Proceedings of IPDPSW 2016, pp. 1789–1792 (2016)

    Google Scholar 

  7. Fang, J., Zhang, R., Fu, T., Zhang, Z., Zhou, A., Zhu, J.: Parallel stream processing against workload skewness and variance. In: Proceedings of HPDC 2017, pp. 15–26 (2017)

    Google Scholar 

  8. Kulkarni, S., et al.: Twitter heron: stream processing at scale. In: Proceedings of SIGMOD 2015, pp. 239–250 (2015)

    Google Scholar 

  9. Murray, D., McSherry, F., Isaacs, R., Isard, M., Barham, P., Abadi, M.: Naiad: a timely dataflow system. In: Proceedings of SOSP 2013, pp. 439–455 (2013)

    Google Scholar 

  10. Nasir, M.A.U., et al.: Load balancing for skewed streams on heterogeneous clusters. CoRR abs/1705.09073 (2017). http://arxiv.org/abs/1705.09073

  11. Nasir, M.A.U., Morales, G.D.F., Garcia-Soriano, D., Kourtellis, N., Serafini, M.: The power of both choices: practical load balancing for distributed stream processing engines. In: Proceedings of ICDE 2015, pp. 137–148 (2015)

    Google Scholar 

  12. Nasir, M.A.U., Morales, G.D.F., Garcia-Soriano, D., Kourtellis, N., Serafini,M.: Partial key grouping: load-balanced partitioning of distributed streams. CoRR abs/1510.07623 (2015). http://arxiv.org/abs/1510.07623

  13. Nasir, M.A.U., Morales, G.D.F., Kourtellis, N., Serafini, M.: When two choices are not enough: balancing at scale in distributed stream processing. In: Proceedings of ICDE 2016, pp. 589–600 (2016)

    Google Scholar 

  14. Peng, B., Hosseini, M., Hong, Z., Farivar, R., Campbell, R.: R-storm: resource-aware scheduling in storm. In: Proceedings of Middleware 2015, pp. 149–161 (2015)

    Google Scholar 

  15. Rivetti, N., Anceaume, E., Busnel, Y., Querzoni, L., Sericola, B.: Online scheduling for shuffle grouping in distributed stream processing systems. In: Proceedings of Middleware 2016, pp. 11–22 (2016)

    Google Scholar 

  16. Rivetti, N., Querzoni, L., Anceaume, E., Busnel, Y., Sericola, B.: Efficient key grouping for near-optimal load balancing in stream processing systems. In: Proceedings of DEBS 2015, pp. 80–91 (2015)

    Google Scholar 

  17. Schneider, S., Wolf, J., Hildrum, K., Khandekar, R.: Dynamic load balancing for ordered data-parallel regions in distributed streaming systems. In: Proceedings of Middleware 2016, pp. 21–34 (2016)

    Google Scholar 

  18. Toshniwal, A., et al.: Storm @twitter. In: Proceedings of SIGMOD 2014, pp. 147–156 (2014)

    Google Scholar 

  19. Xu, J., Chen, Z., Tang, J., Su, S.: T-storm: traffic-aware online scheduling in storm. In: Proceedings of ICDCS 2014, pp. 535–544 (2014)

    Google Scholar 

Download references

Acknowledgment

This work was supported by National Key Research and Development Program under grant 2018YFB1003600 and Pre-research Project of Beifang under grant FFZ-1601.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Song Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, F., Wu, S., Jin, H. (2018). Network-Aware Grouping in Distributed Stream Processing Systems. In: Vaidya, J., Li, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2018. Lecture Notes in Computer Science(), vol 11334. Springer, Cham. https://doi.org/10.1007/978-3-030-05051-1_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-05051-1_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-05050-4

  • Online ISBN: 978-3-030-05051-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics