research-article

Locality-aware dynamic VM reconfiguration on MapReduce clouds

Authors:

Seungryoul MaengAuthors Info & Claims

HPDC '12: Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing

Pages 27 - 36

https://doi.org/10.1145/2287076.2287082

Published: 18 June 2012 Publication History

Abstract

Cloud computing based on system virtualization, has been expanding its services to distributed data-intensive platforms such as MapReduce and Hadoop. Such a distributed platform on clouds runs in a virtual cluster consisting of a number of virtual machines. In the virtual cluster, demands on computing resources for each node may fluctuate, due to data locality and task behavior. However, current cloud services use a static cluster configuration, fixing or manually adjusting the computing capability of each virtual machine (VM). The fixed homogeneous VM configuration may not adapt to changing resource demands in individual nodes.

In this paper, we propose a dynamic VM reconfiguration technique for data-intensive computing on clouds, called Dynamic Resource Reconfiguration (DRR). DRR can adjust the computing capability of individual VMs to maximize the utilization of resources. Among several factors causing resource imbalance in the Hadoop platforms, this paper focuses on data locality. Although assigning tasks on the nodes containing their input data can improve the overall performance of a job significantly, the fixed computing capability of each node may not allow such locality-aware scheduling. DRR dynamically increases or decreases the computing capability of each node to enhance locality-aware task scheduling. We evaluate the potential performance improvement of DRR on a 100-node cluster, and its detailed behavior on a small scale cluster with constrained network bandwidth. On the 100-node cluster, DRR can improve the throughput of Hadoop jobs by 15% on average, and 41% on the private cluster with the constrained network connection.

References

[1]

Amazon Elastic Compute Cloud (EC2). http://aws.amazon.com/ec2/.

[2]

Apache Hadoop. http://hadoop.apache.org.

[3]

Apache Hive. http://hadoop.apache.org/hive.

[4]

Apache Pig. http://pig.apache.org.

[5]

Hive Performance Benchmarks. https://issues.apache.org/jira/browse/HIVE-396.

[6]

Resource management with VMware DRS. http://www.vmware.com/pdf/vmware_drs_wp.pdf.

[7]

Xen credit scheduler. http://wiki.xen.org/wiki/Credit_Scheduler.

[8]

I. A. Ajay Gulati, Ganesha Shanmuganathan. Cloud scale resource management: Challenges and techniques. In Proceedings of the 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud), 2010.

Digital Library

[9]

F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst., 26:4:1--4:26, June 2008.

Digital Library

[10]

J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. In Proceedings of the 6th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 137--150, 2004.

Digital Library

[11]

S. L. Faraz Ahmad and T. V. Mithuna Thottethodi. MapReduce with communication overlap (marco). http://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1412 &context=ece%tr, 2007.

[12]

M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In Proceedings of the 2nd European Conference on Coputer Systems (EuroSys), 2007.

Digital Library

[13]

M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. Quincy: fair scheduling for distributed computing clusters. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP), 2009.

Digital Library

[14]

H. Kang, Y. Chen, J. L. Wong, R. Sion, and J. Wu. Enhancement of xen's scheduler for MapReduce workloads. In Proceedings of the 20th International Symposium on High Performance Distributed Computing (HPDC), 2011.

Digital Library

[15]

G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 international conference on Management of data (SIGMOD), 2010.

Digital Library

[16]

B. Palanisamy, A. Singh, L. Liu, and B. Jain. Purlieus: locality-aware resource allocation for MapReduce in a cloud. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2011.

Digital Library

[17]

C. A. Waldspurger. Memory resource management in vmware esx server. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI), 2002.

Digital Library

[18]

M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In Proceedings of the 5th European Conference on Computer systems (EuroSys), 2010.

Digital Library

[19]

M. Zaharia, A. Konwinski, A. D. Joseph, R. H. Katz, and I. Stoica. Improving MapReduce performance in heterogeneous environments. In Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2008.

Digital Library

Cited By

Usman SMehmood RKatib IAlbeshri A(2022)Data Locality in High Performance Computing, Big Data, and Converged Systems: An Analysis of the Cutting Edge and a Future System ArchitectureElectronics10.3390/electronics1201005312:1(53)Online publication date: 23-Dec-2022
https://doi.org/10.3390/electronics12010053
Wang JLi XRuiz RYang JChu D(2022)Energy Utilization Task Scheduling for MapReduce in Heterogeneous ClustersIEEE Transactions on Services Computing10.1109/TSC.2020.296669715:2(931-944)Online publication date: 1-Mar-2022
https://doi.org/10.1109/TSC.2020.2966697
Li BRoy RPatel TGadepally VGettings KTiwari Dde Supinski BHall MGamblin T(2021)RIBBONProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476168(1-13)Online publication date: 14-Nov-2021
https://dl.acm.org/doi/10.1145/3458817.3476168
Show More Cited By

Index Terms

Locality-aware dynamic VM reconfiguration on MapReduce clouds
1. Software and its engineering
  1. Software organization and properties
    1. Software system structures
      1. Distributed systems organizing principles

Recommendations

Curtailing job completion time in MapReduce clouds through improved Virtual Machine allocation

Study and analysis of the virtual machine allocation problem in MapReduce cloud environment.Establishment of the positive correlation between throughput and adjacency of nodes executing MapReduce tasks.Proof of NP-hardness of virtual machine allocation ...
MapReduce in the Clouds for Science
CLOUDCOM '10: Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science

The utility computing model introduced by cloud computing combined with the rich set of cloud infrastructure services offers a very viable alternative to traditional servers and computing clusters. MapReduce distributed data processing architecture has ...
Energy efficiency of VM consolidation in IaaS clouds

The energy efficiency of cloud computing has recently attracted a great deal of attention. As a result of raised expectations, cloud providers such as Amazon and Microsoft have started to deploy a new IaaS service, a MapReduce-style virtual cluster, to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

HPDC '12: Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing

June 2012

308 pages

ISBN:9781450308052

DOI:10.1145/2287076

General Chair:
Dick Epema
Delft University of Technology and Eindhoven University of Technology, The Netherlands
,
Program Chairs:
Thilo Kielmann
Vrije Universiteit, The Netherlands
,
Matei Ripeanu
The University of British Columbia, Canada

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

University of Arizona: University of Arizona
SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

HPDC'12

Sponsor:

University of Arizona
SIGARCH

HPDC'12: The 21st International Symposium on High-Performance Parallel and Distributed Computing

June 18 - 22, 2012

Delft, The Netherlands

Acceptance Rates

HPDC '12 Paper Acceptance Rate 23 of 143 submissions, 16%;

Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

48
Total Citations
View Citations
830
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Usman SMehmood RKatib IAlbeshri A(2022)Data Locality in High Performance Computing, Big Data, and Converged Systems: An Analysis of the Cutting Edge and a Future System ArchitectureElectronics10.3390/electronics1201005312:1(53)Online publication date: 23-Dec-2022
https://doi.org/10.3390/electronics12010053
Wang JLi XRuiz RYang JChu D(2022)Energy Utilization Task Scheduling for MapReduce in Heterogeneous ClustersIEEE Transactions on Services Computing10.1109/TSC.2020.296669715:2(931-944)Online publication date: 1-Mar-2022
https://doi.org/10.1109/TSC.2020.2966697
Li BRoy RPatel TGadepally VGettings KTiwari Dde Supinski BHall MGamblin T(2021)RIBBONProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476168(1-13)Online publication date: 14-Nov-2021
https://dl.acm.org/doi/10.1145/3458817.3476168
Vinutha DRaju G(2021)Budget Constraint Scheduler for Big Data Using Hadoop MapReduceSN Computer Science10.1007/s42979-021-00638-02:4Online publication date: 30-Apr-2021
https://doi.org/10.1007/s42979-021-00638-0
Son SIm HMoon Y(2021)Stochastic distributed data stream partitioning using task locality: design, implementation, and optimizationThe Journal of Supercomputing10.1007/s11227-021-03725-4Online publication date: 24-Mar-2021
https://doi.org/10.1007/s11227-021-03725-4
Afrati FDolev SSharma SUllman J(2021)Meta-X: A Technique for Reducing Communication in Geographically Distributed ComputationsCyber Security Cryptography and Machine Learning10.1007/978-3-030-78086-9_34(467-486)Online publication date: 1-Jul-2021
https://doi.org/10.1007/978-3-030-78086-9_34
Ranaweera PImrith VLiyanag MJurcut A(2020)Security as a Service Platform Leveraging Multi-Access Edge Computing Infrastructure ProvisionsICC 2020 - 2020 IEEE International Conference on Communications (ICC)10.1109/ICC40277.2020.9148660(1-6)Online publication date: Jun-2020
https://doi.org/10.1109/ICC40277.2020.9148660
(2019)Virtual cluster optimisation for MapReduce-like applicationsInternational Journal of High Performance Computing and Networking10.5555/3337625.333762813:4(378-388)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.5555/3337625.3337628
Hu ZKang HZheng M(2019)Stream Data Load Prediction for Resource Scaling Using Online Support Vector RegressionAlgorithms10.3390/a1202003712:2(37)Online publication date: 14-Feb-2019
https://doi.org/10.3390/a12020037
Lee SJo JKim Y(2019)Survey of Data Locality in Apache Hadoop2019 IEEE International Conference on Big Data, Cloud Computing, Data Science & Engineering (BCD)10.1109/BCD.2019.8885148(46-53)Online publication date: May-2019
https://doi.org/10.1109/BCD.2019.8885148
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten