skip to main content
10.1145/2806416.2806449acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Dynamic Resource Management In a Massively Parallel Stream Processing Engine

Published:17 October 2015Publication History

ABSTRACT

The emerging interest in Massively Parallel Stream Processing Engines (MPSPEs), which are able to process long-standing computations over data streams with ever-growing velocity at a large-scale cluster, calls for efficient dynamic resource management techniques to avoid any waste of resources and/or excessive processing latency. In this paper, we propose an approach to integrate dynamic resource management with passive fault-tolerance mechanisms in a MPSPE so that we can harvest the checkpoints prepared for failure recovery to enhance the efficiency of dynamic load migrations. To maximize the opportunity of reusing checkpoints for fast load migration, we formally define a checkpoint allocation problem and provide a pragmatic algorithm to solve it. We implement all the proposed techniques on top of Apache Storm, an open-source MPSPE, and conduct extensive experiments using a real dataset to examine various aspects of our techniques. The results show that our techniques can greatly improve the efficiency of dynamic resource reconfiguration without imposing significant overhead or latency to the normal job execution.

References

  1. T. Akidau, A. Balikov, K. Bekiroğlu, S. Chernyak, J. Haberman, R. Lax, S. McVeety, D. Mills, P. Nordstrom, and S. Whittle. Millwheel: fault-tolerant stream processing at internet scale. VLDB, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Alves, P. Bizarro, and P. Marques. Flood: elastic streaming mapreduce. In DEBS '10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Balazinska, H. Balakrishnan, S. R. Madden, and M. Stonebraker. Fault-tolerance in the borealis distributed stream processing system. ACM Trans. Database Syst., 33(1), Mar. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Castro Fernandez, M. Migliavacca, E. Kalyvianaki, and P. Pietzuch. Integrating scale out and fault tolerance in stream processing using operator state management. In SIGMOD, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. Gedik, S. Schneider, M. Hirzel, and K.-L. Wu. Elastic scaling for data stream processing. TPDS, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y. Gu, Z. Zhang, F. Ye, H. Yang, M. Kim, H. Lei, and Z. Liu. An empirical study of high availability in stream processing systems. In Middleware, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. V. Gulisano, R. Jimenez-Peris, M. Patino-Martinez, C. Soriente, and P. Valduriez. Streamcloud: An elastic and scalable data streaming system. IEEE Trans. Parallel Distrib. Syst., 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. Zookeeper: wait-free coordination for internet-scale systems. In USENIXATC'10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J.-H. Hwang, M. Balazinska, A. Rasin, U. Cetintemel, M. Stonebraker, and S. Zdonik. High-availability algorithms for distributed stream processing. In ICDE '05, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. hyon Hwang, Y. Xing, and S. Zdonik. A cooperative, self-configuring high-availability solution for stream processing. In In ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  11. W. Lam, L. Liu, S. Prasad, A. Rajaraman, Z. Vacheri, and A. Doan. Muppet: Mapreduce-style processing of fast data. PVLDB, 5(12), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Li, K. Tufte, V. Shkapenyuk, V. Papadimos, T. Johnson, and D. Maier. Out-of-order processing: A new architecture for high-performance stream systems. Proc. VLDB Endow., 1(1), Aug. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed stream computing platform. In ICDMW '10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. Satzger, W. Hummer, P. Leitner, and S. Dustdar. Esc: Towards an Elastic Stream Computing Platform for the Cloud. In CLOUD'11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Schneider, H. Andrade, B. Gedik, A. Biem, and K.-L. Wu. Elastic scaling of data parallel operators in stream processing. In IPDPS '09, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Z. Sebepou and K. Magoutis. Cec: Continuous eventual checkpointing for data stream processing operators. In DSN, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. A. Shah, J. M. Hellerstein, and E. Brewer. Highly available, fault-tolerant, parallel dataflows. In SIGMOD, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. A. Shah, J. M. Hellerstein, S. Chandrasekaran, and M. J. Franklin. Flux: An adaptive partitioning operator for continuous query systems. In ICDE, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. Ryaboy. Storm@twitter. In SIGMOD '14, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Xing, S. Zdonik, and J.-H. Hwang. Dynamic load distribution in the borealis stream processor. In ICDE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Yagiura, S. Iwasaki, T. Ibaraki, and F. Glover. A very large-scale neighborhood search algorithm for the multi-resource generalized assignment problem. Discrete Optimization, 1(1), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Zaharia, T. Das, H. Li, S. Shenker, and I. Stoica. Discretized streams: An efficient and fault-tolerant model for stream processing on large clusters. In HotCloud, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Y. Zhou, K. Aberer, and K.-L. Tan. Toward massive query optimization in large-scale distributed stream systems. In Middleware, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Zhou, B. C. Ooi, K. Tan, and J. Wu. Efficient dynamic operator placement in a locally distributed continuous query system. In O™, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Dynamic Resource Management In a Massively Parallel Stream Processing Engine

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management
      October 2015
      1998 pages
      ISBN:9781450337946
      DOI:10.1145/2806416

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 October 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CIKM '15 Paper Acceptance Rate165of646submissions,26%Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader