research-article

Dynamic Resource Management In a Massively Parallel Stream Processing Engine

Authors:
Kasper Grud Skat Madsen

University of Southern Denmark, Odense, Denmark

University of Southern Denmark, Odense, Denmark
View Profile

,
Yongluan Zhou

University of Southern Denmark, Odense, Denmark

University of Southern Denmark, Odense, Denmark
View Profile

CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge ManagementOctober 2015Pages 13–22https://doi.org/10.1145/2806416.2806449

Published:17 October 2015Publication History

CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

Pages 13–22

ABSTRACT

The emerging interest in Massively Parallel Stream Processing Engines (MPSPEs), which are able to process long-standing computations over data streams with ever-growing velocity at a large-scale cluster, calls for efficient dynamic resource management techniques to avoid any waste of resources and/or excessive processing latency. In this paper, we propose an approach to integrate dynamic resource management with passive fault-tolerance mechanisms in a MPSPE so that we can harvest the checkpoints prepared for failure recovery to enhance the efficiency of dynamic load migrations. To maximize the opportunity of reusing checkpoints for fast load migration, we formally define a checkpoint allocation problem and provide a pragmatic algorithm to solve it. We implement all the proposed techniques on top of Apache Storm, an open-source MPSPE, and conduct extensive experiments using a real dataset to examine various aspects of our techniques. The results show that our techniques can greatly improve the efficiency of dynamic resource reconfiguration without imposing significant overhead or latency to the normal job execution.

References

T. Akidau, A. Balikov, K. Bekiroğlu, S. Chernyak, J. Haberman, R. Lax, S. McVeety, D. Mills, P. Nordstrom, and S. Whittle. Millwheel: fault-tolerant stream processing at internet scale. VLDB, 2013. Google ScholarDigital Library
D. Alves, P. Bizarro, and P. Marques. Flood: elastic streaming mapreduce. In DEBS '10, 2010. Google ScholarDigital Library
M. Balazinska, H. Balakrishnan, S. R. Madden, and M. Stonebraker. Fault-tolerance in the borealis distributed stream processing system. ACM Trans. Database Syst., 33(1), Mar. 2008. Google ScholarDigital Library
R. Castro Fernandez, M. Migliavacca, E. Kalyvianaki, and P. Pietzuch. Integrating scale out and fault tolerance in stream processing using operator state management. In SIGMOD, 2013. Google ScholarDigital Library
B. Gedik, S. Schneider, M. Hirzel, and K.-L. Wu. Elastic scaling for data stream processing. TPDS, 2013. Google ScholarDigital Library
Y. Gu, Z. Zhang, F. Ye, H. Yang, M. Kim, H. Lei, and Z. Liu. An empirical study of high availability in stream processing systems. In Middleware, 2009. Google ScholarDigital Library
V. Gulisano, R. Jimenez-Peris, M. Patino-Martinez, C. Soriente, and P. Valduriez. Streamcloud: An elastic and scalable data streaming system. IEEE Trans. Parallel Distrib. Syst., 2012. Google ScholarDigital Library
P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. Zookeeper: wait-free coordination for internet-scale systems. In USENIXATC'10, 2010. Google ScholarDigital Library
J.-H. Hwang, M. Balazinska, A. Rasin, U. Cetintemel, M. Stonebraker, and S. Zdonik. High-availability algorithms for distributed stream processing. In ICDE '05, 2005. Google ScholarDigital Library
J. hyon Hwang, Y. Xing, and S. Zdonik. A cooperative, self-configuring high-availability solution for stream processing. In In ICDE, 2007.Google ScholarCross Ref
W. Lam, L. Liu, S. Prasad, A. Rajaraman, Z. Vacheri, and A. Doan. Muppet: Mapreduce-style processing of fast data. PVLDB, 5(12), 2012. Google ScholarDigital Library
J. Li, K. Tufte, V. Shkapenyuk, V. Papadimos, T. Johnson, and D. Maier. Out-of-order processing: A new architecture for high-performance stream systems. Proc. VLDB Endow., 1(1), Aug. 2008. Google ScholarDigital Library
L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed stream computing platform. In ICDMW '10, 2010. Google ScholarDigital Library
B. Satzger, W. Hummer, P. Leitner, and S. Dustdar. Esc: Towards an Elastic Stream Computing Platform for the Cloud. In CLOUD'11, 2011. Google ScholarDigital Library
S. Schneider, H. Andrade, B. Gedik, A. Biem, and K.-L. Wu. Elastic scaling of data parallel operators in stream processing. In IPDPS '09, 2009. Google ScholarDigital Library
Z. Sebepou and K. Magoutis. Cec: Continuous eventual checkpointing for data stream processing operators. In DSN, 2011. Google ScholarDigital Library
M. A. Shah, J. M. Hellerstein, and E. Brewer. Highly available, fault-tolerant, parallel dataflows. In SIGMOD, 2004. Google ScholarDigital Library
M. A. Shah, J. M. Hellerstein, S. Chandrasekaran, and M. J. Franklin. Flux: An adaptive partitioning operator for continuous query systems. In ICDE, 2003.Google ScholarDigital Library
A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. Ryaboy. Storm@twitter. In SIGMOD '14, 2014. Google ScholarDigital Library
Y. Xing, S. Zdonik, and J.-H. Hwang. Dynamic load distribution in the borealis stream processor. In ICDE, 2005. Google ScholarDigital Library
M. Yagiura, S. Iwasaki, T. Ibaraki, and F. Glover. A very large-scale neighborhood search algorithm for the multi-resource generalized assignment problem. Discrete Optimization, 1(1), 2004. Google ScholarDigital Library
M. Zaharia, T. Das, H. Li, S. Shenker, and I. Stoica. Discretized streams: An efficient and fault-tolerant model for stream processing on large clusters. In HotCloud, 2012. Google ScholarDigital Library
Y. Zhou, K. Aberer, and K.-L. Tan. Toward massive query optimization in large-scale distributed stream systems. In Middleware, 2008. Google ScholarDigital Library
Y. Zhou, B. C. Ooi, K. Tan, and J. Wu. Efficient dynamic operator placement in a locally distributed continuous query system. In O™, 2006.Google ScholarDigital Library

Index Terms

Dynamic Resource Management In a Massively Parallel Stream Processing Engine
1. Information systems
  1. Data management systems
    1. Database management system engines

Recommendations

Heterogeneous Resource Management for Dynamic Real-Time Systems
HCW '00: Proceedings of the 9th Heterogeneous Computing Workshop

Dynamic real-time systems face many resource management problems.This paper addresses the following problems: (1) dynamic resource allocation to provide QoS objectives,(2) heterogeneous resources, and(3) non-intrusive accurate monitoring of QoS, ...
Read More
Robust parallel resource management in shared memory multiprocessor systems
IPPS '95: Proceedings of the 9th International Symposium on Parallel Processing

Parallel machines are being increasingly used for applications that require both quick response time and high reliability. This poses a challenge in programming these systems since it must be ensured that there is sufficient redundancy to cope with ...
Read More
Workload balancing and adaptive resource management for the swift storage system on cloud

The demand for big data storage and processing has become a challenge in today's industry. To meet the challenge, there is an increasing number of enterprises adopting distributed storage systems. Frequently, in these systems, storage nodes intensively ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management
October 2015
1998 pages
ISBN:9781450337946
DOI:10.1145/2806416
General Chairs:
James Bailey
The University of Melbourne
,
Alistair Moffat
The University of Melbourne
,
Program Chairs:
Charu C. Aggarwal
IBM
,
Maarten de Rijke
University of Amsterdam
,
Ravi Kumar
Google
,
Vanessa Murdock
Microsoft
,
Timos Sellis
RMIT University
,
Jeffrey Xu Yu
Chinese University of Hong Kong
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
elasticity
fault-tolerance
resource management
Qualifiers
- research-article
Conference

Acceptance Rates
CIKM '15 Paper Acceptance Rate165of646submissions,26%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 471
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Dynamic Resource Management In a Massively Parallel Stream Processing Engine

CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Heterogeneous Resource Management for Dynamic Real-Time Systems

Robust parallel resource management in shared memory multiprocessor systems

Workload balancing and adaptive resource management for the swift storage system on cloud