research-article

Hadoop+: Modeling and Evaluating the Heterogeneity for MapReduce Applications in Heterogeneous Clusters

Authors:

Youliang YanAuthors Info & Claims

ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing

Pages 143 - 153

https://doi.org/10.1145/2751205.2751236

Published: 08 June 2015 Publication History

Abstract

Despite the widespread adoption of heterogeneous clusters in modern data centers, modeling heterogeneity is still a big challenge, especially for large-scale MapReduce applications. In a CPU/GPU hybrid heterogeneous cluster, allocating more computing resources to a MapReduce application does not always mean better performance, since simultaneously running CPU and GPU tasks will contend for shared resources.

This paper proposes a heterogeneity model to predict the shared resource contention between the simultaneously running tasks of a MapReduce application when heterogeneous computing resources (e.g. CPUs and GPUs) are allocated. To support the approach, we present a heterogeneous MapReduce framework, Hadoop+, which enables CPUs and GPUs to process big data coordinately, and leverages the heterogeneity model to assist users in selecting the computing resources for different purposes.

Our experimental results show three benefits. First, Hadoop+ exploits GPU capability, and achieves 1.4x to 16.1x speedups over Hadoop for 5 real applications when running individually. Second, the heterogeneity model can be used to allocate GPUs among multiple simultaneously running MapReduce applications, bringing up to 36.9% (17.6% in average) speedup when multiple applications are running simultaneously. Third, the model is verified to be able to select the optimal or most cost-effective resource consumption.

References

[1]

Apache mahout. http://mahout.apache.org/.

[2]

Efficient algorithms for k-means clustering. http://www.cs.umd.edu/~mount/Projects/KMeans/.

[3]

Hadoop Yarn. In http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/YARN.html.

[4]

Official Price for Intel Xeon E5-2620. http://ark.intel.com/products/64594/Intel-Xeon-Processor-E5-2620-15M-Cache-2_00-GHz-7_20-GTs-Intel-QPI.

[5]

Apache Hadoop. In http://lucene.apache.org/hadoop/, 2006.

[6]

P. Carns, K. Harms, W. Allcock, C. Bacon, S. Lang, R. Latham, and R. Ross. Understanding and improving computational science storage access through continuous characterization. In MSST, 2011.

Digital Library

[7]

B. Catanzaro, N. Sundaram, and K. K. A map reduce framework for programming graphics processors. In Workshop on Software Tools for MultiCore Systems, 2008.

[8]

L. Chen and G. Agrawal. Optimizing mapreduce for gpus with effective shared memory usage. In HPDC, 2012.

Digital Library

[9]

R. Chen, H. Chen, and B. Zang. Tiled-mapreduce: Optimizing resource usages of data-parallel applications on multicore with tiling. In PACT, pages 523--534, 2010.

Digital Library

[10]

S. Cho and L. Jin. Managing distributed, shared L2 caches through OS-level page allocation. In MICRO, 2006.

Digital Library

[11]

T. Cover and P. Hart. Nearest neighbor pattern classification. IEEE Trans. Inf. Theor., 13(1):21--27, Sept. 2006.

Digital Library

[12]

H. Cui, G. Ruan, J. Xue, R. Xie, L. Wang, and X. Feng. A collaborative divide-and-conquer k-means clustering algorithm for processing large data. In CF, 2014.

Digital Library

[13]

H. Cui, L. Wang, J. Xue, Y. Yang, and X. Feng. Automatic library generation for blas3 on gpus. In IPDPS, 2011.

Digital Library

[14]

J. Dean and S. Ghemawat. "mapreduce: simplified data processing on large clusters". In OSDI, 2004.

Digital Library

[15]

D. Eklov, N. Nikoleris, D. Black-Schaffer, and E. Hagersten. Bandwidth Bandit: Quantitative characterization of memory contention. In CGO, 2013.

Digital Library

[16]

I. El-Helw, R. Hofman, and H. E. Bal. Scaling mapreduce vertically and horizontally. In SC'14, pages 525--535, 2014.

Digital Library

[17]

A. Fedorova, M. Seltzer, and M. Smith. Improving performance isolation on chip multiprocessors via an operating system scheduler. In PACT, 2007.

Digital Library

[18]

G. Frost. Aparapi in amd developer websize. In "http://developer.amd.com/tools/heterogeneous-computing/aparapi/".

[19]

M. Grossman, M. Breternitz, and V. Sarkar. "hadoopcl: Mapreduce on distributed heterogeneous platforms through seamless integration of hadoop and opencl". In IPDPSW, 2013.

Digital Library

[20]

B. He, W. Fang, Q. Luo, N. K. Govindaraju, and T. Wang. "mars: A mapreduce framework on graphics processors". In PACT, 2008.

Digital Library

[21]

C. Hong, D. Chen, W. Chen, W. Zheng, and H. Lin. Mapcg: writing parallel program portable between cpu and gpu. In PACT, 2010.

Digital Library

[22]

F. Ji and X. Ma. Using shared memory to accelerate mapreduce on graphics processing units. In IPDPS, 2011.

Digital Library

[23]

J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In HPCA, 2008.

[24]

Y. Liu, R. Gunasekarany, X. Ma, and S. S. Vazhkudai. Automatic Identification of Application I/O Signatures from Noisy Server-Side Traces. In FAST, 2014.

Digital Library

[25]

Z. Liu, H. Li, and G. Miao. Mapreduce-based backpropagation neural network over large scale mobile data. In ICNC, 2010.

[26]

J. Mars, L. Tang, R. Hundt, K. Skadron, and M. L. Soffa. Bubble-Up: Increasing Utilization in Modern Warehouse Scale Computers via Sensible Co-locations. In MICRO, 2011.

Digital Library

[27]

J. Mars, L. Tang, and M. L. Soffa. Directly characterizing cross-core interference through contention synthesis. In HiPEAC, 2011.

Digital Library

[28]

NVIDIA. Official Price for Tesla C2050. In http://www.nvidia.com/object/io_1258360868914.html.

[29]

S. Okur, C. Radio, and Y. Lin. "hadoop+aparapi: Making heterogeneous mapreduce programming easier". In "https://netfiles.uiuc.edu/okur2".

[30]

C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. "evaluating mapreduce for multi-core and multiprocessor systems". In HPCA, 2007.

Digital Library

[31]

J. Sim, A. Dasgupta, H. Kim, and R. Vuduc. A performance analysis framework for identifying potential benefits in gpgpu applications. In PPoPP, 2012.

Digital Library

[32]

J. A. Stuart and J. D. Owens. Multi-gpu mapreduce on gpu clusters. In IPDPS, 2011.

Digital Library

[33]

L. Tang, J. Mars, and M. L. Soffa. Compiling For Niceness Mitigating Contention for QoS in Warehouse Scale Computers. In CGO, 2012.

Digital Library

[34]

L. Tang, J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. The Impact of Memory Subsystem Resource Sharing on Datacenter Applications. In ISCA, 2011.

Digital Library

[35]

A. Uselton, M. Howison, N. Wright, D. Skinner, N. Keen, J. Shalf, K. Karavanic, and L. Oliker. Parallel I/O performance: From events to ensembles. In IPDPS, 2010.

[36]

D. Xu, C. Wu, P. Yew, J. Li, and Z. Wang. Providing Fairness on Shared-Memory Multiprocessors via Process Scheduling. In SIGMETRICS, 2012.

Digital Library

[37]

D. Xu, C. Wu, and P.-C. Yew. On mitigating memory bandwidth contention through bandwidth-aware scheduling. In PACT, 2010.

Digital Library

[38]

X. Zhang, S. Dwarkadas, and K. Shen. Towards practical page coloring-based multicore cache management. In EuroSys, 2009.

Digital Library

[39]

J. Zhao, H. Cui, J. Xue, X. Feng, Y. Yan, and W. Yang. An empirical model for predicting cross-core performance interference on multicore processors. In PACT, 2013.

Digital Library

[40]

S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. In ASPLOS, 2010.

Digital Library

Cited By

Liu YHuang LWu MCui HLv FFeng XXue JAmaral JKulkarni M(2019)PPOpenCL: a performance-portable OpenCL compiler with host and kernel thread code fusionProceedings of the 28th International Conference on Compiler Construction10.1145/3302516.3307350(2-16)Online publication date: 16-Feb-2019
https://dl.acm.org/doi/10.1145/3302516.3307350
Zhang FZhai JShen XMutlu OChen W(2018)ZwiftProceedings of the 2018 International Conference on Supercomputing10.1145/3205289.3205325(195-206)Online publication date: 12-Jun-2018
https://dl.acm.org/doi/10.1145/3205289.3205325
Zhao JCui HZhang YXue JFeng X(2018)Revisiting Loop Tiling for DatacentersProceedings of the 2018 International Conference on Supercomputing10.1145/3205289.3205306(328-340)Online publication date: 12-Jun-2018
https://dl.acm.org/doi/10.1145/3205289.3205306
Show More Cited By

Index Terms

Hadoop+: Modeling and Evaluating the Heterogeneity for MapReduce Applications in Heterogeneous Clusters
1. Computing methodologies
  1. Modeling and simulation
    1. Model development and analysis
      1. Modeling methodologies

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing

June 2015

446 pages

ISBN:9781450335591

DOI:10.1145/2751205

General Chair:
Laxmi N. Bhuyan
University of California, Riverside
,
Program Chairs:
Fred Chong
University of California, Santa Barbara
,
Vivek Sarkar
Rice University

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

National High Technology Research and Development Program of China
National Basic Research Program of China
NSFC
Australian Council Research (ARC) Grants

Conference

ICS'15

Sponsor:

SIGARCH

ICS'15: 2015 International Conference on Supercomputing

June 8 - 11, 2015

California, Newport Beach, USA

Acceptance Rates

ICS '15 Paper Acceptance Rate 40 of 160 submissions, 25%;

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
290
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)1

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu YHuang LWu MCui HLv FFeng XXue JAmaral JKulkarni M(2019)PPOpenCL: a performance-portable OpenCL compiler with host and kernel thread code fusionProceedings of the 28th International Conference on Compiler Construction10.1145/3302516.3307350(2-16)Online publication date: 16-Feb-2019
https://dl.acm.org/doi/10.1145/3302516.3307350
Zhang FZhai JShen XMutlu OChen W(2018)ZwiftProceedings of the 2018 International Conference on Supercomputing10.1145/3205289.3205325(195-206)Online publication date: 12-Jun-2018
https://dl.acm.org/doi/10.1145/3205289.3205325
Zhao JCui HZhang YXue JFeng X(2018)Revisiting Loop Tiling for DatacentersProceedings of the 2018 International Conference on Supercomputing10.1145/3205289.3205306(328-340)Online publication date: 12-Jun-2018
https://dl.acm.org/doi/10.1145/3205289.3205306
Zheng CWang LMcKee SZhang LYe HZhan J(2018)XOS: An Application-Defined Operating System for Datacenter Computing2018 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2018.8622507(398-407)Online publication date: Dec-2018
https://doi.org/10.1109/BigData.2018.8622507
Khezr SNavimipour N(2017)MapReduce and Its Applications, Challenges, and ArchitectureJournal of Grid Computing10.1007/s10723-017-9408-015:3(295-321)Online publication date: 1-Sep-2017
https://dl.acm.org/doi/10.1007/s10723-017-9408-0
Serrano EBlas JCarretero JAbella M(2017)Architecture for the Execution of Tasks in Apache Spark in Heterogeneous EnvironmentsEuro-Par 2016: Parallel Processing Workshops10.1007/978-3-319-58943-5_41(504-515)Online publication date: 28-May-2017
https://doi.org/10.1007/978-3-319-58943-5_41
Song MHu YXu YLi CChen HYuan JLi TZaks AMendelson BRauchwerger LHwu W(2016)Bridging the Semantic Gaps of GPU Acceleration for Scale-out CNN-based Big Data ProcessingProceedings of the 2016 International Conference on Parallel Architectures and Compilation10.1145/2967938.2967944(315-326)Online publication date: 11-Sep-2016
https://dl.acm.org/doi/10.1145/2967938.2967944
Yuan YSalmi MHuai YWang KLee RZhang X(2016)Spark-GPU: An accelerated in-memory data processing engine on clusters2016 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2016.7840613(273-283)Online publication date: Dec-2016
https://doi.org/10.1109/BigData.2016.7840613
Xu ZChi XXiao N(2016)High-performance computing environment: a review of twenty years of experiments in ChinaNational Science Review10.1093/nsr/nww0013:1(36-48)Online publication date: 20-Jan-2016
https://doi.org/10.1093/nsr/nww001

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten