skip to main content
10.1145/3220192.3220193acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

POSTER: An Intelligent Framework to Parallelize Hadoop Phases

Authors Info & Claims
Published:11 June 2018Publication History

ABSTRACT

Hadoop-stock is a reliable, scalable, and open source implementation of the MapReduce framework to process data-intensive applications in a distributed and parallel environment. In a common environment between multiple users with various types of applications, due to the lower number of resources than the number of jobs, there will be multi-wave jobs. Shuffling as the longest phase of running a job has the most adverse effect (network traffic) on the job execution time. On one hand, due to the dependency of shuffle phase to reduce task, the shuffle phase could not start until the reduce task being scheduled. On the other hand, the static scheduling of reduce tasks results in loss of reduce slots. This paper presents our ongoing effort in the designing an intelligent service in which the sort/merge and shuffle phases are completely independent of map and reduce phases and could act in parallel with map and reduce phases. This parallelism mitigates the job completion time.

References

  1. T. White, Hadoop: The Definitive Guide, O'Reilly Media, Inc., 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. P. Chen, C. Y. Zhang, Data-intensive applications, challenges, techniques and technologies: A survey on big data, Information Sciences 275 (2014) 314--347.Google ScholarGoogle ScholarCross RefCross Ref
  3. Y. Guo, J. Rao, D. Cheng, X. Zhou, Ishuffle: Improving hadoop performance with shuffle-on-write, IEEE Transactions on Parallel and Distributed Systems 28 (6) (2017) 1649--1662. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Rao, R. Ramakrishnan, A. Silberstein, M. Ovsiannikov, D. Reeves, Sailfish: a framework for large scale data processing, Proceedings of the Third ACM Symposium on Cloud Computing, (2012) 1--14, San Jose, California. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. https://sourceforge.net/p/kosmosfs/wiki/Home/.Google ScholarGoogle Scholar

Index Terms

  1. POSTER: An Intelligent Framework to Parallelize Hadoop Phases

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      HPDC '18: Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing
      June 2018
      25 pages
      ISBN:9781450358996
      DOI:10.1145/3220192

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 June 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate166of966submissions,17%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader