Performance Analysis of Adapting a MapReduce Framework to Dynamically Accommodate Heterogeneity

Hartog, Jessica; DelValle, Renan; Govindaraju, Madhusudhan; Lewis, Michael J.

doi:10.1007/978-3-662-46703-9_5

Jessica Hartog²²,
Renan DelValle²²,
Madhusudhan Govindaraju²² &
…
Michael J. Lewis²²

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 9070))

1434 Accesses
1 Citations

Abstract

When data centers employ the common and economical practice of upgrading subsets of nodes incrementally, rather than replacing or upgrading all nodes at once, they end up with clusters whose nodes have non-uniform processing capability, which we also call performance-heterogeneity. Popular frameworks supporting the effective MapReduce programming model for Big Data applications do not flexibly adapt to these environments. Instead, existing MapReduce frameworks, including Hadoop, typically divide data evenly among worker nodes, thereby inducing the well-known problem of stragglers on slower nodes. Our alternative MapReduce framework, called MARLA, divides each worker’s labor into sub-tasks, delays the binding of data to worker processes, and thereby enables applications to run faster in performance-heterogeneous environments. This approach does introduce overhead, however. We explore and characterize the opportunity for performance gains, and identify when the benefits outweigh the costs. Our results suggest that frameworks should support finer grained sub-tasking and dynamic data partitioning when running on some performance-heterogeneous clusters. Blindly taking this approach in homogeneous clusters can slow applications down. Our study further suggests the opportunity for cluster managers to build performance-heterogeneous clusters by design, if they also run MapReduce frameworks that can exploit them.

This work was supported in part by NSF grant CNS-0958501.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
MARLA stands for “MApReduce with adaptive Load balancing for heterogeneous and Load imbalAnced clusters.”
2.
We do not use the Fastest node configuration for this set of experiments.

References

Apache Hadoop. http://hadoop.apache.org
1000 Genomes: A Deep Catalog of Human Genetic Variation. http://www.1000genomes.org
McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., DePristo, M.A.: The genome analysis toolkit: a mapreduce framework for analyzing next-generation dna sequencing data. Genome Res. 20, 1297–1303 (2010)
Article Google Scholar
Starr, D.L., Bloom, J.S., Brewer, J.M., Butler, N., Clein, C.: A map/reduce parallelized framework for rapidly classifying astrophysical transients. In: Astronomical Data Analysis Software and Systems XIX, Series, vol. 434. ASP Conference Series (2010)
Google Scholar
Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I.: Improving mapreduce performance in heterogeneous environments. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, Series, OSDI 2008, pp. 29–42. USENIX Association, Berkeley (2008). http://dl.acm.org/citation.cfm?id=1855741.1855744
Xie, J., Yin, S., Ruan, X., Ding, Z., Tian, Y., Majors, J., Manzanares, A., Quin, X.: Improving mapreduce performance through data placement in heterogeneous hadoop clusters. In: IPDPS Workshops, pp. 1–9 (2010)
Google Scholar
The FutureGrid Resource Project: An XSEDE Resource Provider. https://portal.futuregrid.org/about
National Energy Research Scientific Computing Center. http://nersc.gov
Fadika, Z., Dede, E., Hartog, J., Govindaraju, M.: Marla: mapreduce for heterogeneous and load imbalanced clusters. In: 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 49–56, May 2012
Google Scholar
Fadika, Z., Dede, E., Govindaraju, M., Ramakrishnan, L.: Benchmarking mapreduce implementations for application usage scenarios. In: IEEE/ACM International Workshop on Grid Computing, pp. 90–97 (2011)
Google Scholar
Ahmad, F., Chakradhar, S.T., Raghunathan, A., Vijaykumar, T.: Tarazu: optimizing mapreduce on heterogeneous clusters. ACM SIGARCH Comput Archit. News 40(1), 61–74 (2012)
Article Google Scholar
HDFS. http://hadoop.apache.org/docs/hdfs/r0.22.0/hdfs_design.html
Hartog, J., DelValle, R., Govindaraju, M., Lewis, M.: Configuring a mapreduce framework for performance-heterogeneous clusters. In: Proceedings of the 2013 IEEE Big Data 2014 Conference, Research Track, Series, BigData 2014, Anchorage, AL, USA (2014)
Google Scholar
Nathuji, R., Isci, C., Gorbatov, E.: Exploiting platform heterogeneity for power efficient data centers. In: Fourth International Conference on Autonomic Computing, ICAC 2007, p. 5. IEEE (2007)
Google Scholar
Fadika, Z., Dede, E., Govindaraju, M., Ramakrishnan, L.: Mariane: mapreduce implementation adapted for HPC environments. In: IEEE/ACM International Workshop on Grid Computing, pp. 82–89 (2011)
Google Scholar
General Parallel File System. http://www-03.ibm.com/systems/software/gpfs
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endowment 2(2), 1626–1629 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, State University of New York (SUNY) at Binghamton, Binghamton, NY, 13902, USA
Jessica Hartog, Renan DelValle, Madhusudhan Govindaraju & Michael J. Lewis

Authors

Jessica Hartog
View author publications
You can also search for this author in PubMed Google Scholar
Renan DelValle
View author publications
You can also search for this author in PubMed Google Scholar
Madhusudhan Govindaraju
View author publications
You can also search for this author in PubMed Google Scholar
Michael J. Lewis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jessica Hartog .

Editor information

Editors and Affiliations

IRIT, Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
FAW, University of Linz, Linz, Austria
Josef Küng
FAW, University of Linz, Linz, Austria
Roland Wagner
King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia
Sherif Sakr
Chinese Academy of Sciences, Beijing, China
Lizhe Wang
The University of Sydney, Sydney, Australia
Albert Zomaya

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hartog, J., DelValle, R., Govindaraju, M., Lewis, M.J. (2015). Performance Analysis of Adapting a MapReduce Framework to Dynamically Accommodate Heterogeneity. In: Hameurlain, A., Küng, J., Wagner, R., Sakr, S., Wang, L., Zomaya, A. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XX. Lecture Notes in Computer Science(), vol 9070. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46703-9_5

Download citation

DOI: https://doi.org/10.1007/978-3-662-46703-9_5
Published: 18 March 2015
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-46702-2
Online ISBN: 978-3-662-46703-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics