skip to main content
10.1145/2287076.2287082acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

Locality-aware dynamic VM reconfiguration on MapReduce clouds

Published: 18 June 2012 Publication History

Abstract

Cloud computing based on system virtualization, has been expanding its services to distributed data-intensive platforms such as MapReduce and Hadoop. Such a distributed platform on clouds runs in a virtual cluster consisting of a number of virtual machines. In the virtual cluster, demands on computing resources for each node may fluctuate, due to data locality and task behavior. However, current cloud services use a static cluster configuration, fixing or manually adjusting the computing capability of each virtual machine (VM). The fixed homogeneous VM configuration may not adapt to changing resource demands in individual nodes.
In this paper, we propose a dynamic VM reconfiguration technique for data-intensive computing on clouds, called Dynamic Resource Reconfiguration (DRR). DRR can adjust the computing capability of individual VMs to maximize the utilization of resources. Among several factors causing resource imbalance in the Hadoop platforms, this paper focuses on data locality. Although assigning tasks on the nodes containing their input data can improve the overall performance of a job significantly, the fixed computing capability of each node may not allow such locality-aware scheduling. DRR dynamically increases or decreases the computing capability of each node to enhance locality-aware task scheduling. We evaluate the potential performance improvement of DRR on a 100-node cluster, and its detailed behavior on a small scale cluster with constrained network bandwidth. On the 100-node cluster, DRR can improve the throughput of Hadoop jobs by 15% on average, and 41% on the private cluster with the constrained network connection.

References

[1]
Amazon Elastic Compute Cloud (EC2). http://aws.amazon.com/ec2/.
[2]
Apache Hadoop. http://hadoop.apache.org.
[3]
Apache Hive. http://hadoop.apache.org/hive.
[4]
Apache Pig. http://pig.apache.org.
[5]
Hive Performance Benchmarks. https://issues.apache.org/jira/browse/HIVE-396.
[6]
Resource management with VMware DRS. http://www.vmware.com/pdf/vmware_drs_wp.pdf.
[7]
Xen credit scheduler. http://wiki.xen.org/wiki/Credit_Scheduler.
[8]
I. A. Ajay Gulati, Ganesha Shanmuganathan. Cloud scale resource management: Challenges and techniques. In Proceedings of the 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud), 2010.
[9]
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst., 26:4:1--4:26, June 2008.
[10]
J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. In Proceedings of the 6th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 137--150, 2004.
[11]
S. L. Faraz Ahmad and T. V. Mithuna Thottethodi. MapReduce with communication overlap (marco). http://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1412 &context=ece%tr, 2007.
[12]
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In Proceedings of the 2nd European Conference on Coputer Systems (EuroSys), 2007.
[13]
M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. Quincy: fair scheduling for distributed computing clusters. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP), 2009.
[14]
H. Kang, Y. Chen, J. L. Wong, R. Sion, and J. Wu. Enhancement of xen's scheduler for MapReduce workloads. In Proceedings of the 20th International Symposium on High Performance Distributed Computing (HPDC), 2011.
[15]
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 international conference on Management of data (SIGMOD), 2010.
[16]
B. Palanisamy, A. Singh, L. Liu, and B. Jain. Purlieus: locality-aware resource allocation for MapReduce in a cloud. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2011.
[17]
C. A. Waldspurger. Memory resource management in vmware esx server. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI), 2002.
[18]
M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In Proceedings of the 5th European Conference on Computer systems (EuroSys), 2010.
[19]
M. Zaharia, A. Konwinski, A. D. Joseph, R. H. Katz, and I. Stoica. Improving MapReduce performance in heterogeneous environments. In Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2008.

Cited By

View all
  • (2022)Data Locality in High Performance Computing, Big Data, and Converged Systems: An Analysis of the Cutting Edge and a Future System ArchitectureElectronics10.3390/electronics1201005312:1(53)Online publication date: 23-Dec-2022
  • (2022)Energy Utilization Task Scheduling for MapReduce in Heterogeneous ClustersIEEE Transactions on Services Computing10.1109/TSC.2020.296669715:2(931-944)Online publication date: 1-Mar-2022
  • (2021)RIBBONProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476168(1-13)Online publication date: 14-Nov-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HPDC '12: Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
June 2012
308 pages
ISBN:9781450308052
DOI:10.1145/2287076
  • General Chair:
  • Dick Epema,
  • Program Chairs:
  • Thilo Kielmann,
  • Matei Ripeanu
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. MapReduce
  2. cloud computing
  3. virtual clusters

Qualifiers

  • Research-article

Conference

HPDC'12
Sponsor:

Acceptance Rates

HPDC '12 Paper Acceptance Rate 23 of 143 submissions, 16%;
Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Data Locality in High Performance Computing, Big Data, and Converged Systems: An Analysis of the Cutting Edge and a Future System ArchitectureElectronics10.3390/electronics1201005312:1(53)Online publication date: 23-Dec-2022
  • (2022)Energy Utilization Task Scheduling for MapReduce in Heterogeneous ClustersIEEE Transactions on Services Computing10.1109/TSC.2020.296669715:2(931-944)Online publication date: 1-Mar-2022
  • (2021)RIBBONProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476168(1-13)Online publication date: 14-Nov-2021
  • (2021)Budget Constraint Scheduler for Big Data Using Hadoop MapReduceSN Computer Science10.1007/s42979-021-00638-02:4Online publication date: 30-Apr-2021
  • (2021)Stochastic distributed data stream partitioning using task locality: design, implementation, and optimizationThe Journal of Supercomputing10.1007/s11227-021-03725-4Online publication date: 24-Mar-2021
  • (2021)Meta-X: A Technique for Reducing Communication in Geographically Distributed ComputationsCyber Security Cryptography and Machine Learning10.1007/978-3-030-78086-9_34(467-486)Online publication date: 1-Jul-2021
  • (2020)Security as a Service Platform Leveraging Multi-Access Edge Computing Infrastructure ProvisionsICC 2020 - 2020 IEEE International Conference on Communications (ICC)10.1109/ICC40277.2020.9148660(1-6)Online publication date: Jun-2020
  • (2019)Virtual cluster optimisation for MapReduce-like applicationsInternational Journal of High Performance Computing and Networking10.5555/3337625.333762813:4(378-388)Online publication date: 1-Jan-2019
  • (2019)Stream Data Load Prediction for Resource Scaling Using Online Support Vector RegressionAlgorithms10.3390/a1202003712:2(37)Online publication date: 14-Feb-2019
  • (2019)Survey of Data Locality in Apache Hadoop2019 IEEE International Conference on Big Data, Cloud Computing, Data Science & Engineering (BCD)10.1109/BCD.2019.8885148(46-53)Online publication date: May-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media