skip to main content
10.1145/2443416.2443421acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Turbine: a distributed-memory dataflow engine for extreme-scale many-task applications

Published: 20 May 2012 Publication History

Abstract

Efficiently utilizing the rapidly increasing concurrency of multi-petaflop computing systems is a significant programming challenge. One approach is to structure applications with an upper layer of many loosely-coupled coarse-grained tasks, each comprising a tightly-coupled parallel function or program. "Many-task" programming models such as functional parallel dataflow may be used at the upper layer to generate massive numbers of tasks, each of which generates significant tighly-coupled parallelism at the lower level via multithreading, message passing, and/or partitioned global address spaces. At large scales, however, the management of task distribution, data dependencies, and inter-task data movement is a significant performance challenge. In this work, we describe Turbine, a new highly scalable and distributed many-task dataflow engine. Turbine executes a generalized many-task intermediate representation with automated self-distribution, and is scalable to multi-petaflop infrastructures. We present here the architecture of Turbine and its performance on highly concurrent systems.

References

[1]
S. Ahuja, N. Carriero, and D. Gelernter. Linda and friends. IEEE Computer, 19(8):26--34, 1986.
[2]
T. G. Armstrong, J. M. Wozniak, M. Wilde, K. Maheshwari, D. S. Katz, M. Ripeanu, E. L. Lusk, and I. T. Foster. ExM: High level dataflow programming for extreme-scale systems. Under review for HotPar 2012. ANL Preprint ANL/MCS-P2045-0212, available at http://www.mcs.anl.gov/publications.
[3]
ASCAC Subcommittee on Exascale Computing. The opportunities and challenges of exascale computing, 2010. U.S. Dept. of Energy report.
[4]
R. D. Blumofe and P. A. Lisiecki. Adaptive and reliable parallel computing on networks of workstations. In Proc. of Annual Conf. on USENIX, page 10, Berkeley, CA, USA, 1997. USENIX Association.
[5]
G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Lemarinier, and J. Dongarra. DAGuE: A generic distributed DAG engine for high performance computing. In Proc. Intl. Parallel and Distributed Processing Symp., 2011.
[6]
J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Commun. ACM, 51:107--113, January 2008.
[7]
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. SIGOPS Oper. Syst. Rev., 41:205--220, Oct. 2007.
[8]
E. Deelman, T. Kosar, C. Kesselman, and M. Livny. What makes workflows work in an opportunistic environment? Concurrency and Computation: Practice and Experience, 18:1187--1199, 2006.
[9]
J. Dinan, S. Krishnamoorthy, D. B. Larkins, J. Nieplocha, and P. Sadayappan. Scioto: A framework for global-view task parallelism. Intl. Conf. on Parallel Processing, 0:586--593, 2008.
[10]
J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: A runtime for iterative MapReduce. In Proc. of 19th ACM Intl. Symp. on High Performance Distributed Computing, HPDC '10, pages 810--818, New York, 2010. ACM.
[11]
J. Evans and A. Rzhetsky. Machine science. Science, 329(5990):399--400, 2010.
[12]
B. Fitzpatrick. Distributed caching with memcached. Linux Journal, 2004:5--, August 2004.
[13]
M. Hategan, J. Wozniak, and K. Maheshwari. Coasters: uniform resource provisioning and access for scientific computing on clouds and grids. In Proc. Utility and Cloud Computing, 2011.
[14]
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. SIGOPS Oper. Syst. Rev., 41:59--72, March 2007.
[15]
J. W. Jones, G. Hoogenboom, P. Wilkens, C. Porter, and G. Tsuji, editors. Decision Support System for Agrotechnology Transfer Version 4.0: Crop Model Documentation. University of Hawaii, 2003.
[16]
A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44:35--40, April 2010.
[17]
Z. Li and M. Parashar. Comet: A scalable coordination space for decentralized distributed environments. In 2nd Intl. Work. on Hot Topics in Peer-to-Peer Systems, HOT-P2P 2005, pages 104--111, 2005.
[18]
E. L. Lusk, S. C. Pieper, and R. M. Butler. More scalability, less pain: A simple programming model and its implementation for extreme computing. SciDAC Review, 17:30--37, January 2010.
[19]
M. D. McCool. Structured parallel programming with deterministic patterns. In Proc. HotPar, 2010.
[20]
D. G. Murray and S. Hand. Scripting the cloud with Skywriting. In HotCloud '10: Proc. of 2nd USENIX Work. on Hot Topics in Cloud Computing, Boston, MA, USA, June 2010. USENIX.
[21]
D. G. Murray, M. Schwarzkopf, C. Smowton, S. Smith, A. Madhavapeddy, and S. Hand. CIEL: a universal execution engine for distributed data-flow computing. In Proc. NSDI, 2011.
[22]
C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: A not-so-foreign language for data processing. In Proc. of 2008 ACM SIGMOD Intl. Conf. on Management of Data, SIGMOD '08, pages 1099--1110, New York, 2008. ACM.
[23]
R. Pike, S. Dorward, R. Griesemer, and S. Quinlan. Interpreting the data: Parallel analysis with Sawzall. Scientific Programming, 13(4):277--298, 2005.
[24]
I. Raicu, Z. Zhang, M. Wilde, I. Foster, P. Beckman, K. Iskra, and B. Clifford. Toward loosely coupled programming on petascale systems. In Proc. of 2008 ACM/IEEE Conf. on Supercomputing, SC '08, pages 22:1--22:12, Piscataway, NJ, 2008. IEEE Press.
[25]
Redis. http://redis.io/.
[26]
J. Shalf, J. Morrison, and S. Dosanj. Exascale computing technology challenges. VECPAR'2010, 2010.
[27]
H. Simon, T. Zacharia, and R. Stevens. Modeling and simulation at the exascale for energy and the environment, 2007. Report on the Advanced Scientific Computing Research Town Hall Meetings on Simulation and Modeling at the Exascale for Energy, Ecological Sustainability and Global Security (E3).
[28]
R. Stevens and A. White. Architectures and technology for extreme scale computing, 2009. U.S. Dept. of Energy report.
[29]
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow., 2:1626--1629, August 2009.
[30]
G. von Laszewski, I. Foster, J. Gawor, and P. Lane. A Java Commodity Grid Kit. Concurrency and Computation: Practice and Experience, 13(8-9), 2001.
[31]
E. Walker, W. Xu, and V. Chandar. Composing and executing parallel data-flow graphs with shell pipes. In Work. on Workflows in Support of Large-Scale Science at SC'09, 2009.
[32]
B. B. Welch, K. Jones, and J. Hobbs. Practical programming in Tcl and Tk. Prentice Hall, 4th edition, 2003.
[33]
M. Wilde, I. Foster, K. Iskra, P. Beckman, Z. Zhang, A. Espinosa, M. Hategan, B. Clifford, and I. Raicu. Parallel scripting for applications at the petascale and beyond. Computer, 42(11):50--60, 2009.
[34]
M. Wilde, M. Hategan, J. M. Wozniak, B. Clifford, D. S. Katz, and I. Foster. Swift: A language for distributed parallel scripting. Parallel Computing, 37:633--652, 2011.
[35]
Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K. Gunda, and J. Currey. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In Proc. of Symp. on Operating System Design and Implementation (OSDI), December 2008.
[36]
M. Zaharia, N. M. M. Chowdhury, M. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. Technical Report UCB/EECS-2010-53, EECS Department, University of California, Berkeley, May 2010.

Cited By

View all
  • (2019)Data-Intensive Workflow Management: For Clouds and Data-Intensive and Scalable Computing EnvironmentsSynthesis Lectures on Data Management10.2200/S00915ED1V01Y201904DTM06014:4(1-179)Online publication date: 13-May-2019
  • (2019)Managing genomic variant calling workflows with Swift/TPLOS ONE10.1371/journal.pone.021160814:7(e0211608)Online publication date: 9-Jul-2019
  • (2018)Improving Parallelism in Data-Intensive Workflows with Distributed Databases2018 IEEE International Conference on Services Computing (SCC)10.1109/SCC.2018.00034(209-216)Online publication date: Jul-2018
  • Show More Cited By

Index Terms

  1. Turbine: a distributed-memory dataflow engine for extreme-scale many-task applications

    Recommendations

    Reviews

    Michael G. Murphy

    Harnessing the potential of high-performance computing (HPC) is an ongoing challenge. In this paper, the authors present a model in which dataflow programs utilize distributed memory evaluation in an extreme-scale computing environment while broadly spreading both the evaluation of programs and the generation of tasks. The model, Turbine, is a scalable dataflow engine built around distributed memory and message passing. The authors contend that allowing distributed execution of program fragments and processing of structural fragments can better exploit parallelism and concurrency. An effective introductory section sets the tone and introduces Turbine. The second section provides motivation by looking at several applications that can benefit from an engine like Turbine. The next two sections address the foundational approach for Turbine, which builds on the Swift parallel scripting language and the asynchronous distributed load balancing (ADLB) library with extensions for dataflow processing. These sections include a number of use cases. Section 5 focuses on implementation issues, including program structure and distribution of data storage. The next section covers performance issues related to task distribution, data operations, distributed data structures, and distributed iteration. The seventh section contrasts related work with the approach taken with Turbine. A conclusion and list of 36 key references close the paper. The authors have made a substantial contribution to HPC with Turbine, and this readable and well-organized paper, together with future results, should nudge the field in a promising direction. In particular, it will be interesting to see if Turbine can successfully migrate from Swift to other dataflow languages. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SWEET '12: Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies
    May 2012
    58 pages
    ISBN:9781450318761
    DOI:10.1145/2443416
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 May 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. ADLB
    2. MPI
    3. concurrency
    4. dataflow
    5. exascale
    6. swift
    7. turbine

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SWEET 2012
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 4 of 6 submissions, 67%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Data-Intensive Workflow Management: For Clouds and Data-Intensive and Scalable Computing EnvironmentsSynthesis Lectures on Data Management10.2200/S00915ED1V01Y201904DTM06014:4(1-179)Online publication date: 13-May-2019
    • (2019)Managing genomic variant calling workflows with Swift/TPLOS ONE10.1371/journal.pone.021160814:7(e0211608)Online publication date: 9-Jul-2019
    • (2018)Improving Parallelism in Data-Intensive Workflows with Distributed Databases2018 IEEE International Conference on Services Computing (SCC)10.1109/SCC.2018.00034(209-216)Online publication date: Jul-2018
    • (2016)Challenges and Opportunities for Dataflow Processing on Exascale ComputersProceedings of the Sixth Workshop on Data-Flow Execution Models for Extreme Scale Computing10.1145/3292533.3292537(1-5)Online publication date: 15-Sep-2016
    • (2015)Stratified Sampling for Even Workload Partitioning Applied to IDA* and Delaunay AlgorithmsProceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium10.1109/IPDPS.2015.63(460-469)Online publication date: 25-May-2015
    • (2015)Choosing experiments to accelerate collective discoveryProceedings of the National Academy of Sciences10.1073/pnas.1509757112112:47(14569-14574)Online publication date: 9-Nov-2015
    • (2015)Data-centric iteration in dynamic workflowsFuture Generation Computer Systems10.1016/j.future.2014.10.02146:C(114-126)Online publication date: 1-May-2015
    • (2015)A Survey of Data-Intensive Scientific Workflow ManagementJournal of Grid Computing10.1007/s10723-015-9329-813:4(457-493)Online publication date: 1-Dec-2015
    • (2014)Compiler techniques for massively scalable implicit task parallelismProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2014.30(299-310)Online publication date: 16-Nov-2014
    • (2014)Exploring Large Scale Receptor-Ligand Pairs in Molecular Docking Workflows in HPC CloudsProceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops10.1109/IPDPSW.2014.65(536-545)Online publication date: 19-May-2014
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media