skip to main content
10.1145/2110497.2110513acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

AME: an anyscale many-task computing engine

Authors Info & Claims
Published:14 November 2011Publication History

ABSTRACT

Many-Task Computing (MTC) is a new application category that encompasses increasingly popular applications in biology, economics, and statistics. The high inter-task parallelism and data-intensive processing capabilities of these applications pose new challenges to existing supercomputer hardware-software stacks. These challenges include resource provisioning; task dispatching, dependency resolution, and load balancing; data management; and resilience.

This paper examines the characteristics of MTC applications which create these challenges, and identifies related gaps in the middleware that supports these applications on extreme-scale systems. Based on this analysis, we propose AME, an Anyscale MTC Engine, which addresses the scalability aspects of these gaps. We describe the AME framework and present performance results for both synthetic benchmarks and real applications. Our results show that AME's dispatching performance linearly scales up to 14,120 tasks/second on 16,384 cores with high efficiency. The overhead of the intermediate data management scheme does not increase significantly up to 16,384 cores. AME eliminates 73% of the file transfer between compute nodes and the global filesystem for the Montage astronomy application running on 2,048 cores. Our results indicate that AME scales well on today's petascale machines, and is a strong candidate for exascale machines.

References

  1. S. Al-Kiswany, A. Gharaibeh, and M. Ripeanu. The case for a versatile storage system. SIGOPS Oper. Syst. Rev., 44:10--14, March 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnel, and S. Tuecke. Data management and transfer in high-performance computational grid environments. Parallel Comput., 28:749--771, May 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Borthakur. HDFS architecture. http://hadoop.apache.org/hdfs/docs/current/hdfs\_design.pdf.Google ScholarGoogle Scholar
  4. P. H. Carns, W. B. Ligon, III, R. B. Ross, and R. Thakur. PVFS: a parallel file system for linux clusters. In Proceedings of the 4th annual Linux Showcase & Conference - Volume 4, pages 28--28, Berkeley, CA, USA, 2000. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Donovan, G. Huizenga, A. J. Hutton, A. J. Hutton, C. C. Ross, C. C. Ross, L. Symposium, L. Symposium, L. Symposium, M. K. Petersen, W. O. Source, and P. Schwan. Lustre: Building a file system for 1,000-node clusters, 2003.Google ScholarGoogle Scholar
  6. J. Frey, T. Tannenbaum, M. Livny, I. Foster, and S. Tuecke. Condor-G: A computation management agent for multi-institutional grids. Cluster Computing, 5:237--246, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. Iskra, J. W. Romein, K. Yoshii, and P. Beckman. ZOID: I/O-forwarding infrastructure for petascale architectures. In Proc. of 13th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, PPoPP'08, pages 153--162, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. S. Katz, J. C. Jacob, G. B. Berriman, J. Good, A. C. Laity, E. Deelman, C. Kesselman, and G. Singh. A comparison of two methods for building astronomical image mosaics on a grid. In Proc. 2005 Intl. Conf. on Parallel Proc. Workshops, pages 85--94, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. I. Raicu, Y. Zhao, C. Dumitrescu, I. Foster, and M. Wilde. Falkon: a Fast and Light-weight tasK executiON framework. In Proc. IEEE/ACM Supercomputing 2007, pages 1--12, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. F. Schmuck and R. Haskin. GPFS: A shared-disk file system for large computing clusters. In In Proceedings of the 2002 Conference on File and Storage Technologies FAST, pages 231--244, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of the ACM SIGCOMM '01 Conference, August 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Thain, C. Moretti, and J. Hemmes. Chirp: a practical global filesystem for cluster and grid computing. Journal of Grid Computing, 7(1):51--72, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  13. R. Thakur, W. Gropp, and E. Lusk. Data sieving and collective I/O in ROMIO. Symp. on Frontiers of Massively Par. Proc., page 182, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Wilde, I. Foster, K. Iskra, P. Beckman, Z. Zhang, A. Espinosa, M. Hategan, B. Clifford, and I. Raicu. Parallel scripting for applications at the petascale and beyond. Computer, 42:50--60, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Wilde, M. Hategan, J. M. Wozniak, B. Clifford, D. S. Katz, and I. Foster. Swift: A language for distributed parallel scripting. Parallel Computing, pages 633--652, September 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. M. Wozniak and M. Wilde. Case studies in storage access by loosely coupled petascale applications. In Proc. 4th Annual Workshop on Petascale Data Storage, pages 16--20, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Z. Zhang, A. Espinosa, K. Iskra, I. Raicu, I. Foster, and M. Wilde. Design and evaluation of a collective I/O model for loosely coupled petascale programming. In Proceedings of Many-Task Computing on Grids and Supercomputers, 2008, pages 1--10, 2008.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. AME: an anyscale many-task computing engine

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      WORKS '11: Proceedings of the 6th workshop on Workflows in support of large-scale science
      November 2011
      154 pages
      ISBN:9781450311007
      DOI:10.1145/2110497

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 November 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate30of54submissions,56%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader