skip to main content
10.1145/1807167.1807223acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Results Reproduced / v1.1

ParaTimer: a progress indicator for MapReduce DAGs

Published:06 June 2010Publication History

ABSTRACT

Time-oriented progress estimation for parallel queries is a challenging problem that has received only limited attention. In this paper, we present ParaTimer, a new type of time-remaining indicator for parallel queries. Several parallel data processing systems exist. ParaTimer targets environments where declarative queries are translated into ensembles of MapReduce jobs. ParaTimer builds on previous techniques and makes two key contributions. First, it estimates the progress of queries that translate into directed acyclic graphs of MapReduce jobs, where jobs on different paths can execute concurrently (unlike prior work that looked at sequences only). For such queries, we use a new type of critical-path-based progress-estimation approach. Second, ParaTimer handles a variety of real systems challenges such as failures and data skew. To handle unexpected changes in query execution times due to runtime condition changes, ParaTimer provides users with not only one but with a set of time-remaining estimates, each one corresponding to a different carefully selected scenario. We implement our estimator in the Pig system and demonstrate its performance on experiments running on a real, small-scale cluster.

References

  1. C. Ballinger. Born to be parallel: Why parallel origins give Teradata database an enduring performance edge. http://www.teradata.com/t/page/87083/index.html.Google ScholarGoogle Scholar
  2. S. Chaudhuri, R. Kaushik, and R. Ramamurthy. When can we trust progress estimators for SQL queries. In Proc. of the SIGMOD Conf., Jun 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Chaudhuri, V. Narassaya, and R. Ramamurthy. Estimating progress of execution for SQL queries. In Proc. of the SIGMOD Conf., Jun 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. DB2. SQL/monitoring facility. http://www.sprdb2.com/SQLMFVSE.PDF, 2000.Google ScholarGoogle Scholar
  5. DB2. DB2 Basics: The whys and how-tos of DB2 UDB monitoring. http://www.ibm.com/developerworks/db2/library/techarticle/dm-0408hubel/index.html, 2004.Google ScholarGoogle Scholar
  6. J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. In Proc. of the 6th OSDI Symp., 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Dempsey. Monitoring active queries with Teradata Manager 5.0. http://www.teradataforum.com/attachments/a030318c.doc, 2001.Google ScholarGoogle Scholar
  8. D. J. DeWitt, E. Paulson, E. Robinson, J. Naughton, J. Royalty, S. Shankar, and A. Krioukov. Clustera: an integrated computation and data management system. In Proc. of the 34th VLDB Conf., pages 28--41, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Ganapathi, H. Kuno, U. Dayal, J. L. Wiener, A. Fox, M. Jordan, and D. Patterson. Predicting multiple metrics for queries: Better decisions enabled by machine learning. In Proc. of the 25th ICDE Conf., pages 592--603, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Greenplum. Database performance monitor datasheet (Greenplum Database 3.2.1). http://www.greenplum.com/pdf/Greenplum-Performance-Monitor.pdf.Google ScholarGoogle Scholar
  11. Greenplum database. http://www.greenplum.com/.Google ScholarGoogle Scholar
  12. Hadoop. http://hadoop.apache.org/.Google ScholarGoogle Scholar
  13. J. M. Hellerstein, P. J. Haas, and H. J. Wang. Online aggregation. In Proc. of the SIGMOD Conf., 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. IBM zSeries SYSPLEX. http://publib.boulder.ibm.com/infocenter/\\dzichelp/v2r2/index.jsp?topic=/com.ibm.db2.doc.admin/xf6495.htm.Google ScholarGoogle Scholar
  15. M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. In Proc. of the European Conference on Computer Systems (EuroSys), pages 59--72, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Jermaine, A. Dobra, S. Arumugam, S. Joshi, and A. Pol. A disk-based join with probabilistic guarantees. In Proc. of the SIGMOD Conf., pages 563--574, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Large Synoptic Survey Telescope. http://www.lsst.org/.Google ScholarGoogle Scholar
  18. G. Luo, J. F. Naughton, C. J. Ellman, and M. Watzke, Increasing the accuracy and coverage of SQL progress indicators. In Proc. of the 20th ICDE Conf., 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. Luo, J. F. Naughton, C. J. Ellman, and M. Watzke. Toward a progress indicator for database queries. In Proc. of the SIGMOD Conf., Jun 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. Luo, J. F. Naughton, and P. S. Yu. Multi-query SQL progress indicators. In Proc. of the 10th EDBT Conf., 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Mishra and N. Koudas. A lightweight online framework for query progress indicators. In Proc. of the 23rd ICDE Conf., 2007.Google ScholarGoogle ScholarCross RefCross Ref
  22. C. Mishra and M. Volkovs. ConEx: A system for monitoring queries (demonstration). In Proc. of the SIGMOD Conf., Jun 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. Morton, A. Friesen, M. Balazinska, and D. Grossman. Estimating the progress of MapReduce pipelines. In Proc. of the 26th ICDE Conf., 2010.Google ScholarGoogle ScholarCross RefCross Ref
  24. C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In Proc. of the SIGMOD Conf., pages 1099--1110, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Pig Progress Indicator. http://hadoop.apache.org/pig/.Google ScholarGoogle Scholar
  26. G. Plivna. Long running operations in Oracle. http://www.gplivna.eu/papers/v\$session_longops.htm, 2007.Google ScholarGoogle Scholar
  27. A. Pruscino. Oracle RAC: Architecture and performance. In Proc. of the SIGMOD Conf., page 635, 2003.Google ScholarGoogle Scholar
  28. R. Ramakrishnan and J. Gehrke. Database Management Systems. McGraw-Hill Science Engineering, third edition, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Vertica, inc. http://www.vertica.com/.Google ScholarGoogle Scholar
  30. M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica. Improving mapreduce performance in heterogeneous environments. Proc. of the 8th OSDI Symp., 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. ParaTimer: a progress indicator for MapReduce DAGs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
      June 2010
      1286 pages
      ISBN:9781450300322
      DOI:10.1145/1807167

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 6 June 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader