skip to main content
10.1145/2751205.2751236acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Hadoop+: Modeling and Evaluating the Heterogeneity for MapReduce Applications in Heterogeneous Clusters

Published: 08 June 2015 Publication History

Abstract

Despite the widespread adoption of heterogeneous clusters in modern data centers, modeling heterogeneity is still a big challenge, especially for large-scale MapReduce applications. In a CPU/GPU hybrid heterogeneous cluster, allocating more computing resources to a MapReduce application does not always mean better performance, since simultaneously running CPU and GPU tasks will contend for shared resources.
This paper proposes a heterogeneity model to predict the shared resource contention between the simultaneously running tasks of a MapReduce application when heterogeneous computing resources (e.g. CPUs and GPUs) are allocated. To support the approach, we present a heterogeneous MapReduce framework, Hadoop+, which enables CPUs and GPUs to process big data coordinately, and leverages the heterogeneity model to assist users in selecting the computing resources for different purposes.
Our experimental results show three benefits. First, Hadoop+ exploits GPU capability, and achieves 1.4x to 16.1x speedups over Hadoop for 5 real applications when running individually. Second, the heterogeneity model can be used to allocate GPUs among multiple simultaneously running MapReduce applications, bringing up to 36.9% (17.6% in average) speedup when multiple applications are running simultaneously. Third, the model is verified to be able to select the optimal or most cost-effective resource consumption.

References

[1]
Apache mahout. http://mahout.apache.org/.
[2]
Efficient algorithms for k-means clustering. http://www.cs.umd.edu/~mount/Projects/KMeans/.
[3]
Hadoop Yarn. In http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/YARN.html.
[4]
Official Price for Intel Xeon E5-2620. http://ark.intel.com/products/64594/Intel-Xeon-Processor-E5-2620-15M-Cache-2_00-GHz-7_20-GTs-Intel-QPI.
[5]
Apache Hadoop. In http://lucene.apache.org/hadoop/, 2006.
[6]
P. Carns, K. Harms, W. Allcock, C. Bacon, S. Lang, R. Latham, and R. Ross. Understanding and improving computational science storage access through continuous characterization. In MSST, 2011.
[7]
B. Catanzaro, N. Sundaram, and K. K. A map reduce framework for programming graphics processors. In Workshop on Software Tools for MultiCore Systems, 2008.
[8]
L. Chen and G. Agrawal. Optimizing mapreduce for gpus with effective shared memory usage. In HPDC, 2012.
[9]
R. Chen, H. Chen, and B. Zang. Tiled-mapreduce: Optimizing resource usages of data-parallel applications on multicore with tiling. In PACT, pages 523--534, 2010.
[10]
S. Cho and L. Jin. Managing distributed, shared L2 caches through OS-level page allocation. In MICRO, 2006.
[11]
T. Cover and P. Hart. Nearest neighbor pattern classification. IEEE Trans. Inf. Theor., 13(1):21--27, Sept. 2006.
[12]
H. Cui, G. Ruan, J. Xue, R. Xie, L. Wang, and X. Feng. A collaborative divide-and-conquer k-means clustering algorithm for processing large data. In CF, 2014.
[13]
H. Cui, L. Wang, J. Xue, Y. Yang, and X. Feng. Automatic library generation for blas3 on gpus. In IPDPS, 2011.
[14]
J. Dean and S. Ghemawat. "mapreduce: simplified data processing on large clusters". In OSDI, 2004.
[15]
D. Eklov, N. Nikoleris, D. Black-Schaffer, and E. Hagersten. Bandwidth Bandit: Quantitative characterization of memory contention. In CGO, 2013.
[16]
I. El-Helw, R. Hofman, and H. E. Bal. Scaling mapreduce vertically and horizontally. In SC'14, pages 525--535, 2014.
[17]
A. Fedorova, M. Seltzer, and M. Smith. Improving performance isolation on chip multiprocessors via an operating system scheduler. In PACT, 2007.
[18]
G. Frost. Aparapi in amd developer websize. In "http://developer.amd.com/tools/heterogeneous-computing/aparapi/".
[19]
M. Grossman, M. Breternitz, and V. Sarkar. "hadoopcl: Mapreduce on distributed heterogeneous platforms through seamless integration of hadoop and opencl". In IPDPSW, 2013.
[20]
B. He, W. Fang, Q. Luo, N. K. Govindaraju, and T. Wang. "mars: A mapreduce framework on graphics processors". In PACT, 2008.
[21]
C. Hong, D. Chen, W. Chen, W. Zheng, and H. Lin. Mapcg: writing parallel program portable between cpu and gpu. In PACT, 2010.
[22]
F. Ji and X. Ma. Using shared memory to accelerate mapreduce on graphics processing units. In IPDPS, 2011.
[23]
J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In HPCA, 2008.
[24]
Y. Liu, R. Gunasekarany, X. Ma, and S. S. Vazhkudai. Automatic Identification of Application I/O Signatures from Noisy Server-Side Traces. In FAST, 2014.
[25]
Z. Liu, H. Li, and G. Miao. Mapreduce-based backpropagation neural network over large scale mobile data. In ICNC, 2010.
[26]
J. Mars, L. Tang, R. Hundt, K. Skadron, and M. L. Soffa. Bubble-Up: Increasing Utilization in Modern Warehouse Scale Computers via Sensible Co-locations. In MICRO, 2011.
[27]
J. Mars, L. Tang, and M. L. Soffa. Directly characterizing cross-core interference through contention synthesis. In HiPEAC, 2011.
[28]
NVIDIA. Official Price for Tesla C2050. In http://www.nvidia.com/object/io_1258360868914.html.
[29]
S. Okur, C. Radio, and Y. Lin. "hadoop+aparapi: Making heterogeneous mapreduce programming easier". In "https://netfiles.uiuc.edu/okur2".
[30]
C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. "evaluating mapreduce for multi-core and multiprocessor systems". In HPCA, 2007.
[31]
J. Sim, A. Dasgupta, H. Kim, and R. Vuduc. A performance analysis framework for identifying potential benefits in gpgpu applications. In PPoPP, 2012.
[32]
J. A. Stuart and J. D. Owens. Multi-gpu mapreduce on gpu clusters. In IPDPS, 2011.
[33]
L. Tang, J. Mars, and M. L. Soffa. Compiling For Niceness Mitigating Contention for QoS in Warehouse Scale Computers. In CGO, 2012.
[34]
L. Tang, J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. The Impact of Memory Subsystem Resource Sharing on Datacenter Applications. In ISCA, 2011.
[35]
A. Uselton, M. Howison, N. Wright, D. Skinner, N. Keen, J. Shalf, K. Karavanic, and L. Oliker. Parallel I/O performance: From events to ensembles. In IPDPS, 2010.
[36]
D. Xu, C. Wu, P. Yew, J. Li, and Z. Wang. Providing Fairness on Shared-Memory Multiprocessors via Process Scheduling. In SIGMETRICS, 2012.
[37]
D. Xu, C. Wu, and P.-C. Yew. On mitigating memory bandwidth contention through bandwidth-aware scheduling. In PACT, 2010.
[38]
X. Zhang, S. Dwarkadas, and K. Shen. Towards practical page coloring-based multicore cache management. In EuroSys, 2009.
[39]
J. Zhao, H. Cui, J. Xue, X. Feng, Y. Yan, and W. Yang. An empirical model for predicting cross-core performance interference on multicore processors. In PACT, 2013.
[40]
S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. In ASPLOS, 2010.

Cited By

View all
  • (2019)PPOpenCL: a performance-portable OpenCL compiler with host and kernel thread code fusionProceedings of the 28th International Conference on Compiler Construction10.1145/3302516.3307350(2-16)Online publication date: 16-Feb-2019
  • (2018)ZwiftProceedings of the 2018 International Conference on Supercomputing10.1145/3205289.3205325(195-206)Online publication date: 12-Jun-2018
  • (2018)Revisiting Loop Tiling for DatacentersProceedings of the 2018 International Conference on Supercomputing10.1145/3205289.3205306(328-340)Online publication date: 12-Jun-2018
  • Show More Cited By

Index Terms

  1. Hadoop+: Modeling and Evaluating the Heterogeneity for MapReduce Applications in Heterogeneous Clusters

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing
    June 2015
    446 pages
    ISBN:9781450335591
    DOI:10.1145/2751205
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 June 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Funding Sources

    • National High Technology Research and Development Program of China
    • National Basic Research Program of China
    • NSFC
    • Australian Council Research (ARC) Grants

    Conference

    ICS'15
    Sponsor:
    ICS'15: 2015 International Conference on Supercomputing
    June 8 - 11, 2015
    California, Newport Beach, USA

    Acceptance Rates

    ICS '15 Paper Acceptance Rate 40 of 160 submissions, 25%;
    Overall Acceptance Rate 629 of 2,180 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 25 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)PPOpenCL: a performance-portable OpenCL compiler with host and kernel thread code fusionProceedings of the 28th International Conference on Compiler Construction10.1145/3302516.3307350(2-16)Online publication date: 16-Feb-2019
    • (2018)ZwiftProceedings of the 2018 International Conference on Supercomputing10.1145/3205289.3205325(195-206)Online publication date: 12-Jun-2018
    • (2018)Revisiting Loop Tiling for DatacentersProceedings of the 2018 International Conference on Supercomputing10.1145/3205289.3205306(328-340)Online publication date: 12-Jun-2018
    • (2018)XOS: An Application-Defined Operating System for Datacenter Computing2018 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2018.8622507(398-407)Online publication date: Dec-2018
    • (2017)MapReduce and Its Applications, Challenges, and ArchitectureJournal of Grid Computing10.1007/s10723-017-9408-015:3(295-321)Online publication date: 1-Sep-2017
    • (2017)Architecture for the Execution of Tasks in Apache Spark in Heterogeneous EnvironmentsEuro-Par 2016: Parallel Processing Workshops10.1007/978-3-319-58943-5_41(504-515)Online publication date: 28-May-2017
    • (2016)Bridging the Semantic Gaps of GPU Acceleration for Scale-out CNN-based Big Data ProcessingProceedings of the 2016 International Conference on Parallel Architectures and Compilation10.1145/2967938.2967944(315-326)Online publication date: 11-Sep-2016
    • (2016)Spark-GPU: An accelerated in-memory data processing engine on clusters2016 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2016.7840613(273-283)Online publication date: Dec-2016
    • (2016)High-performance computing environment: a review of twenty years of experiments in ChinaNational Science Review10.1093/nsr/nww0013:1(36-48)Online publication date: 20-Jan-2016

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media