skip to main content
10.1145/2063384.2063462acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Purlieus: locality-aware resource allocation for MapReduce in a cloud

Published:12 November 2011Publication History

ABSTRACT

We present Purlieus, a MapReduce resource allocation system aimed at enhancing the performance of MapReduce jobs in the cloud. Purlieus provisions virtual MapReduce clusters in a locality-aware manner enabling MapReduce virtual machines (VMs) access to input data and importantly, intermediate data from local or close-by physical machines. We demonstrate how this locality-awareness during both map and reduce phases of the job not only improves runtime performance of individual jobs but also has an additional advantage of reducing network traffic generated in the cloud data center. This is accomplished using a novel coupling of, otherwise independent, data and VM placement steps. We conduct a detailed evaluation of Purlieus and demonstrate significant savings in network traffic and almost 50% reduction in job execution times for a variety of workloads.

References

  1. B. Igou "User Survey Analysis: Cloud-Computing Budgets Are Growing and Shifting; Traditional IT Services Providers Must Prepare or Perish". Gartner Report, 2010Google ScholarGoogle Scholar
  2. http://en.wikipedia.org/wiki/Loop_deviceGoogle ScholarGoogle Scholar
  3. J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y. Lu, B. Saha and E. Harris. Reining in the Outliers inMap-Reduce Clusters using Mantri. In OSDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. http://en.wikipedia.org/wiki/Big-dataGoogle ScholarGoogle Scholar
  6. S. Babu. Towards Automatic Optimization of MapReduce Programs. In SOCC, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. http://en.wikipedia.org/wiki/ClickstreamGoogle ScholarGoogle Scholar
  8. K. Kambatla, A. Pathak and H. Pucha. Towards Optimizing Hadoop Provisioning in the Cloud. In HotCloud, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cloudera. http://www.cloudera.com/blog/2010/08/hadoop-for-fraud-detection-and-prevention/Google ScholarGoogle Scholar
  10. K. Morton, A. Friesen, M. Balazinska, D. Grossman. Estimating the Progress of MapReduce Pipelines. In ICDE, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  11. Hadoop DFS User Guide. http://hadoop.apache.org/.Google ScholarGoogle Scholar
  12. T. Wood, P. Shenoy, A. Venkataramani and M. Yousif. Black-box and Gray-box Strategies for Virtual Machine Migration. In NSDI, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Chen, R. Griffith, J. Liu, R. H. Katz and A. D. Joseph. Understanding TCP Incast Throughput Collapse in Datacenter Networks. In WREN, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Amazon Elastic MapReduce. http://aws.amazon.com/elasticmapreduce/Google ScholarGoogle Scholar
  15. Amazon Elastic Compute Cloud. http://aws.amazon.com/ec2/Google ScholarGoogle Scholar
  16. Amazon Simple Storage Service. http://aws.amazon.com/s3/Google ScholarGoogle Scholar
  17. T. Gunarathne, T. Wu, J. Qiu, G. Fox MapReduce in the Clouds for Science. In CloudCom, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Cardosa, P. Narang, A. Chandra, H. Pucha and A. Singh. STEAMEngine: Optimizing MapReduce provisioning in the cloud. Dept. of CSE, Univ. of Minnesota, 2010.Google ScholarGoogle Scholar
  19. M. Al-Fares, A. Loukissas and A. Vahdat. A scalable, commodity data center network architecture. In SIGCOMM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Guo, H. Wu, K. Tan, L. Shiy, Y. Zhang, S. Luz. DCell: A Scalable and Fault-Tolerant Network Structure for Data Centers. In SIGCOMM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, S. Sengupta. VL2: A Scalable and Flexible Data Center Network. In SIGCOMM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, A. Vahdat. Hedera: Dynamic Flow Scheduling for Data Center Networks. In NSDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Hadoop. http://hadoop.apache.org.Google ScholarGoogle Scholar
  24. M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, I. Stoica. Improving MapReduce Performance in Heterogeneous Environments. In OSDI, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. Quincy: fair scheduling for distributed computing clusters. In SOSP, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Wang, A. Butt, P. Pandey, K. Gupta. A Simulation Approach to Evaluating Design Decisions in MapReduce Setups. MASCOTS, 2009.Google ScholarGoogle Scholar
  27. R. J. Mokken. Cliques, clubs and clans. In Quality and Quantity, 1973.Google ScholarGoogle Scholar
  28. M. R. Garey, D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman. ISBN 0-7167-1045-5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. A. Kozuch, M. P. Ryan, R. Gass et al. Tashi: Location-aware Cluster Management. In ACDC, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. K. Kambatla, A. Pathak, and H. Pucha. Towards optimizing hadoop provisioning in the cloud. In HotCloud, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. H. Herodotou, H. Lim, G. Luo, N. Borisov, L. Dong, F. B. Cetin, S. Babu Starfish: A Selftuning System for Big Data Analytics. In CIDR, 2011.Google ScholarGoogle Scholar
  32. G. Khanna, K. Beaty, G. Kar, and A. Kochut. Application performance management in virtualized server environments. In NOMS, 2006.Google ScholarGoogle Scholar
  33. T. Sandholm and K. Lai. Mapreduce optimization using dynamic regulated prioritization. In ACM SIGMETRICS/Performance, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Scheduling in hadoop. http://www.cloudera.com/blog/tag/scheduling/.Google ScholarGoogle Scholar
  35. A. Singh, M. Korupolu, and D. Mohapatra. Server-storage virtualization: Integration and load balancing in data centers. In IEEE/ACM Supercomputing, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. A. Verma, P. Ahuja, and A. Neogi. pMapper: Power and Migration Cost Aware Placement of Applications in Virtualized Systems. In ACM Middleware, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. Phanishayee, H. Shah, E. Krevat, D. Andersen, G. Ganger, G. Gibson, B. Mueller, V. Vasudevan. Safe and Effective Fine-grained TCP Retransmissions for Datacenter Communication. In SIGCOMM 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. G. Lee, N. Tolia, P. Ranganathan, R. Katz. Topology-Aware Resource Allocation for Data-intensive workloads. In APSys, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    SC '11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
    November 2011
    866 pages
    ISBN:9781450307710
    DOI:10.1145/2063384

    Copyright © 2011 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 12 November 2011

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    SC '11 Paper Acceptance Rate74of352submissions,21%Overall Acceptance Rate1,516of6,373submissions,24%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader