Skip to main content

Performance Analysis of Adapting a MapReduce Framework to Dynamically Accommodate Heterogeneity

  • Chapter
  • First Online:
Transactions on Large-Scale Data- and Knowledge-Centered Systems XX

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 9070))

Abstract

When data centers employ the common and economical practice of upgrading subsets of nodes incrementally, rather than replacing or upgrading all nodes at once, they end up with clusters whose nodes have non-uniform processing capability, which we also call performance-heterogeneity. Popular frameworks supporting the effective MapReduce programming model for Big Data applications do not flexibly adapt to these environments. Instead, existing MapReduce frameworks, including Hadoop, typically divide data evenly among worker nodes, thereby inducing the well-known problem of stragglers on slower nodes. Our alternative MapReduce framework, called MARLA, divides each worker’s labor into sub-tasks, delays the binding of data to worker processes, and thereby enables applications to run faster in performance-heterogeneous environments. This approach does introduce overhead, however. We explore and characterize the opportunity for performance gains, and identify when the benefits outweigh the costs. Our results suggest that frameworks should support finer grained sub-tasking and dynamic data partitioning when running on some performance-heterogeneous clusters. Blindly taking this approach in homogeneous clusters can slow applications down. Our study further suggests the opportunity for cluster managers to build performance-heterogeneous clusters by design, if they also run MapReduce frameworks that can exploit them.

This work was supported in part by NSF grant CNS-0958501.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    MARLA stands for “MApReduce with adaptive Load balancing for heterogeneous and Load imbalAnced clusters.”

  2. 2.

    We do not use the Fastest node configuration for this set of experiments.

References

  1. Apache Hadoop. http://hadoop.apache.org

  2. 1000 Genomes: A Deep Catalog of Human Genetic Variation. http://www.1000genomes.org

  3. McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., DePristo, M.A.: The genome analysis toolkit: a mapreduce framework for analyzing next-generation dna sequencing data. Genome Res. 20, 1297–1303 (2010)

    Article  Google Scholar 

  4. Starr, D.L., Bloom, J.S., Brewer, J.M., Butler, N., Clein, C.: A map/reduce parallelized framework for rapidly classifying astrophysical transients. In: Astronomical Data Analysis Software and Systems XIX, Series, vol. 434. ASP Conference Series (2010)

    Google Scholar 

  5. Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I.: Improving mapreduce performance in heterogeneous environments. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, Series, OSDI 2008, pp. 29–42. USENIX Association, Berkeley (2008). http://dl.acm.org/citation.cfm?id=1855741.1855744

  6. Xie, J., Yin, S., Ruan, X., Ding, Z., Tian, Y., Majors, J., Manzanares, A., Quin, X.: Improving mapreduce performance through data placement in heterogeneous hadoop clusters. In: IPDPS Workshops, pp. 1–9 (2010)

    Google Scholar 

  7. The FutureGrid Resource Project: An XSEDE Resource Provider. https://portal.futuregrid.org/about

  8. National Energy Research Scientific Computing Center. http://nersc.gov

  9. Fadika, Z., Dede, E., Hartog, J., Govindaraju, M.: Marla: mapreduce for heterogeneous and load imbalanced clusters. In: 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 49–56, May 2012

    Google Scholar 

  10. Fadika, Z., Dede, E., Govindaraju, M., Ramakrishnan, L.: Benchmarking mapreduce implementations for application usage scenarios. In: IEEE/ACM International Workshop on Grid Computing, pp. 90–97 (2011)

    Google Scholar 

  11. Ahmad, F., Chakradhar, S.T., Raghunathan, A., Vijaykumar, T.: Tarazu: optimizing mapreduce on heterogeneous clusters. ACM SIGARCH Comput Archit. News 40(1), 61–74 (2012)

    Article  Google Scholar 

  12. HDFS. http://hadoop.apache.org/docs/hdfs/r0.22.0/hdfs_design.html

  13. Hartog, J., DelValle, R., Govindaraju, M., Lewis, M.: Configuring a mapreduce framework for performance-heterogeneous clusters. In: Proceedings of the 2013 IEEE Big Data 2014 Conference, Research Track, Series, BigData 2014, Anchorage, AL, USA (2014)

    Google Scholar 

  14. Nathuji, R., Isci, C., Gorbatov, E.: Exploiting platform heterogeneity for power efficient data centers. In: Fourth International Conference on Autonomic Computing, ICAC 2007, p. 5. IEEE (2007)

    Google Scholar 

  15. Fadika, Z., Dede, E., Govindaraju, M., Ramakrishnan, L.: Mariane: mapreduce implementation adapted for HPC environments. In: IEEE/ACM International Workshop on Grid Computing, pp. 82–89 (2011)

    Google Scholar 

  16. General Parallel File System. http://www-03.ibm.com/systems/software/gpfs

  17. Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endowment 2(2), 1626–1629 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jessica Hartog .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Hartog, J., DelValle, R., Govindaraju, M., Lewis, M.J. (2015). Performance Analysis of Adapting a MapReduce Framework to Dynamically Accommodate Heterogeneity. In: Hameurlain, A., Küng, J., Wagner, R., Sakr, S., Wang, L., Zomaya, A. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XX. Lecture Notes in Computer Science(), vol 9070. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46703-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-46703-9_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-46702-2

  • Online ISBN: 978-3-662-46703-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics