Skip to main content

Elastic Resource Provisioning for Batched Stream Processing System in Container Cloud

  • Conference paper
  • First Online:
Web and Big Data (APWeb-WAIM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10366))

Abstract

Batched stream processing systems achieve higher throughput than traditional stream processing systems while providing low latency guarantee. Recently, batched stream processing systems tend to be deployed in cloud due to their requirement of elasticity and cost efficiency. However, the performance of batched stream processing systems are hardly guaranteed in cloud because static resource provisioning for such systems does not fit for stream fluctuation and uneven workload distribution. In this paper, we propose EStream: an elastic batched stream processing system based on Spark Streaming, which transparently adjusts available resource to handle workload fluctuation and uneven distribution in container cloud. Specifically, EStream can automatically scale cluster when resource insufficiency or over-provisioning is detected under the situation of workload fluctuation. On the other hand, it conducts resource scheduling in cluster according to the workload distribution. Experimental results show that EStream is able to handle workload fluctuation and uneven distribution transparently and enhance resource efficiency, compared to original Spark Streaming.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://docker.com/.

References

  1. Cervino, J., Kalyvianaki, E., Salvachua, J., Pietzuch, P.: Adaptive provisioning of stream processing systems in the cloud. In: Proceedings of 2012 IEEE 28th International Conference on Data Engineering Workshops (ICDEW), pp. 295–301. IEEE (2012)

    Google Scholar 

  2. Das, T., Zhong, Y., Stoica, I., Shenker, S.: Adaptive stream processing using dynamic batch sizing. In: Proceedings of the ACM Symposium on Cloud Computing (SoCC), pp. 1–13. ACM (2014)

    Google Scholar 

  3. Fu, T.Z., Ding, J., Ma, R.T., Winslett, M., Yang, Y., Zhang, Z.: Drs: dynamic resource scheduling for real-time analytics over fast streams. In: Proceedings of 2015 IEEE 35th International Conference on Distributed Computing Systems (ICDCS), pp. 411–420. IEEE (2015)

    Google Scholar 

  4. Guo, Y., Rao, J., Jiang, C., Zhou, X.: Flexslot: moving hadoop into the cloud with flexible slot management. In: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 959–969. IEEE (2014)

    Google Scholar 

  5. Jyothi, S.A., Curino, C., Menache, I., Narayanamurthy, S.M., Tumanov, A., Yaniv, J., Goiri, Í., Krishnan, S., Kulkarni, J., Rao, S.: Morpheus: towards automated slos for enterprise clusters. In: Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), p. 117. USENIX (2016)

    Google Scholar 

  6. Kambatla, K., Pathak, A., Pucha, H.: Towards optimizing hadoop provisioning in the cloud. In: Proceedings of USENIX Workshop on Hot Topics in Cloud Computing (HotCloud), vol. 9, p. 12. USENIX (2009)

    Google Scholar 

  7. Kumbhare, A., Frincu, M., Simmhan, Y., Prasanna, V.K.: Fault-tolerant and elastic streaming mapreduce with decentralized coordination. In: Proceedings of 2015 IEEE 35th International Conference on Distributed Computing Systems (ICDCS), pp. 328–338. IEEE (2015)

    Google Scholar 

  8. Lin, W., Qian, Z., Xu, J., Yang, S., Zhou, J., Zhou, L.: Streamscope: continuous reliable distributed processing of big data streams. In: Proceedings of USENIX Symposium on Networked System Design and Implementation (NSDI), pp. 439–454. USENIX (2016)

    Google Scholar 

  9. Madsen, K.G.S., Zhou, Y.: Dynamic resource management in a massively parallel stream processing engine. In: Proceedings of the 24th ACM International Conference on Information and Knowledge Management (CIKM), pp. 13–22. ACM (2015)

    Google Scholar 

  10. Neumeyer, L., Robbins, B., Nair, A., Kesari, A.: S4: distributed stream computing platform. In: Proceedings of 2010 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 170–177. IEEE (2010)

    Google Scholar 

  11. Ousterhout, K., Rasti, R., Ratnasamy, S., Shenker, S., Chun, B.G.: Making sense of performance in data analytics frameworks. In: Proceedings of USENIX Symposium on Networked Systems Design and Implementation (NSDI), pp. 293–307. USENIX (2015)

    Google Scholar 

  12. Park, J., Lee, D., Kim, B., Huh, J., Maeng, S.: Locality-aware dynamic vm reconfiguration on mapreduce clouds. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing (HPDC), pp. 27–36. ACM (2012)

    Google Scholar 

  13. Rasley, J., Karanasos, K., Kandula, S., Fonseca, R., Vojnovic, M., Rao, S.: Efficient queue management for cluster scheduling. In: Proceedings of the 11th European Conference on Computer Systems (EuroSys), p. 36. ACM (2016)

    Google Scholar 

  14. Ruan, J., Zheng, Q., Dong, B.: Optimal resource provisioning approach based on cost modeling for spark applications in public clouds. In: Proceedings of the Doctoral Symposium of the 16th International Middleware Conference, p. 6. ACM (2015)

    Google Scholar 

  15. Toshniwal, A., Taneja, S., Shukla, A., Ramasamy, K., Patel, J.M., Kulkarni, S., Jackson, J., Gade, K., Fu, M., Donham, J., Bhagat, N., Mittal, S., Ryaboy, D.: Storm@ twitter. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 147–156. ACM (2014)

    Google Scholar 

  16. Wu, Y., Tan, K.L.: Chronostream: elastic stateful stream computation in the cloud. In: Proceedings of 2015 IEEE 31st International Conference on Data Engineering (ICDE), pp. 723–734. IEEE (2015)

    Google Scholar 

  17. Xing, Y., Zdonik, S., Hwang, J.H.: Dynamic load distribution in the borealis stream processor. In: Proceedings of 2005 21st International Conference on Data Engineering (ICDE), pp. 791–802. IEEE (2005)

    Google Scholar 

  18. Xu, L., Peng, B., Gupta, I.: Stela: enabling stream processing systems to scale-in and scale-out on-demand. In: Proceedings of IEEE International Conference on Cloud Engineering (IC2E), pp. 22–31. IEEE (2016)

    Google Scholar 

  19. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of USENIX Symposium on Networked Systems Design and Implementation (NSDI), p. 2. USENIX (2012)

    Google Scholar 

  20. Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., Stoica, I.: Discretized streams: fault-tolerant streaming computation at scale. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP), pp. 423–438. ACM (2013)

    Google Scholar 

Download references

Acknowledgments

This research is supported by National Key Research and Development Program under grant 2016YFB1000501, 863 Hi-Tech Research and Development Program under grant No. 2015AA01A203, and National Science Foundation of China under grants No. 61232008. This work is also supported by the Anhui Natural Science Foundation of China under grant (No. 1608085QF147), and Key Project of Support Program for Excellent Youth Scholars in Colleges and Universities of Anhui Province (No. gxyqZD2016332).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Song Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Wu, S., Wang, X., Jin, H., Chen, H. (2017). Elastic Resource Provisioning for Batched Stream Processing System in Container Cloud. In: Chen, L., Jensen, C., Shahabi, C., Yang, X., Lian, X. (eds) Web and Big Data. APWeb-WAIM 2017. Lecture Notes in Computer Science(), vol 10366. Springer, Cham. https://doi.org/10.1007/978-3-319-63579-8_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63579-8_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63578-1

  • Online ISBN: 978-3-319-63579-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics