Abstract
Cloud computing has made the resources needed to execute large-scale in-memory distributed computations widely available. Specialized programming models, e.g., MapReduce, have emerged to offer transparent fault tolerance and fault recovery for specific computational patterns, but they sacrifice generality. In contrast, the Resilient X10 programming language adds failure containment and failure awareness to a general purpose, distributed programming language. A Resilient X10 application spans over a number of places. Its formal semantics precisely specify how it continues executing after a place failure. Thanks to failure awareness, the X10 programmer can in principle build redundancy into an application to recover from failures. In practice, however, correctness is elusive, as redundancy and recovery are often complex programming tasks.
This article further develops Resilient X10 to shift the focus from failure awareness to failure recovery, from both a theoretical and a practical standpoint. We rigorously define the distinction between recoverable and catastrophic failures. We revisit the happens-before invariance principle and its implementation. We shift most of the burden of redundancy and recovery from the programmer to the runtime system and standard library. We make it easy to protect critical data from failure using resilient stores and harness elasticity—dynamic place creation—to persist not just the data but also its spatial distribution.
We demonstrate the flexibility and practical usefulness of Resilient X10 by building several representative high-performance in-memory parallel application kernels and frameworks. These codes are 10× to 25× larger than previous Resilient X10 benchmarks. For each application kernel, the average runtime overhead of resiliency is less than 7%. By comparing application kernels written in the Resilient X10 and Spark programming models, we demonstrate that Resilient X10’s more general programming model can enable significantly better application performance for resilient in-memory distributed computations.
- Bilge Acun, Abhishek Gupta, Nikhil Jain, Akhil Langer, Harshitha Menon, Eric Mikida, Xiang Ni, Michael Robson, Yanhua Sun, Ehsan Totoni, Lukasz Wesolowski, and Laxmikant Kalé. 2014. Parallel programming with migratable objects: Charm++ in practice. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’14). IEEE, 647--658. Google ScholarDigital Library
- Tyler Akidau, Alex Balikov, Kaya Bekiroğlu, Slava Chernyak, Josh Haberman, Reuven Lax, Sam McVeety, Daniel Mills, Paul Nordstrom, and Sam Whittle. 2013. MillWheel: Fault-tolerant stream processing at Internet scale. Proc. VLDB Endow. 6, 11 (Aug. 2013), 1033--1044. Google ScholarDigital Library
- Md Mohsin Ali, James Southern, Peter Strazdins, and Brendan Harding. 2014. Application level fault recovery: Using fault-tolerant Open MPI in a PDE solver. In Proceedings of the International Parallel 8 Distributed Processing Symposium Workshops. IEEE, 1169--1178. Google ScholarDigital Library
- Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, and Matei Zaharia. 2015. Spark SQL: Relational data processing in Spark. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’15). 1383--1394. Google ScholarDigital Library
- Wesley Bland, Aurelien Bouteiller, Thomas Herault, Joshua Hursey, George Bosilca, and Jack J. Dongarra. 2012. An evaluation of user-level failure mitigation support in MPI. In Proceedings of the 19th European MPI Users’ Group Meeting on Recent Advances in Message Passing Interface (EuroMPI’12). Springer, 193--203. Google ScholarDigital Library
- George Bosilca, Rémi Delmas, Jack Dongarra, and Julien Langou. 2009. Algorithm-based fault tolerance applied to high performance computing. J. Parallel Distrib. Comput. 69, 4 (Apr. 2009), 410--416. Google ScholarDigital Library
- Yingyi Bu, Bill Howe, Magdalena Balazinska, and Michael D. Ernst. 2010. HaLoop: Efficient iterative data processing on large clusters. Proc. VLDB Endow. 3, 1--2 (2010), 285--296. Google ScholarDigital Library
- Sergey Bykov, Alan Geller, Gabriel Kliot, James R. Larus, Ravi Pandya, and Jorgen Thelin. 2011. Orleans: Cloud computing for everyone. In Proceedings of the 2nd ACM Symposium on Cloud Computing (SOCC’11). ACM, New York, NY, Article 16, 14 pages. Google ScholarDigital Library
- Vincent Cavé, Jisheng Zhao, Jun Shirako, and Vivek Sarkar. 2011. Habanero-Java: The new adventures of old X10. In Proceedings of the 9th International Conference on Principles and Practice of Programming in Java (PPPJ’11). 51--61. Google ScholarDigital Library
- Chapel 2016. Chapel Language Specification version 0.982. Technical Report. Cray Inc.Google Scholar
- Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. 2005. X10: An object-oriented approach to non-uniform cluster computing. In Proceedings of the 20th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA’05). 519--538. Google ScholarDigital Library
- Andrew Chien, Pavan Balaji, Peter Beckman, Nan Dun, Aiman Fang, Hajime Fujita, Kamil Iskra, Zachary Rubenstein, Ziming Zheng, Rob Schreiber et al. 2015. Versioned distributed arrays for resilience in scientific applications: Global view resilience. Procedia Comput. Sci. 51 (2015), 29--38. Google ScholarDigital Library
- Wei-Chiu Chuang, Bo Sang, Sunghwan Yoo, Rui Gu, Milind Kulkarni, and Charles Killian. 2013. EventWave: Programming model and runtime support for tightly-coupled elastic cloud applications. In Proceedings of the 4th Annual Symposium on Cloud Computing (SOCC’13). ACM, New York, NY. Google ScholarDigital Library
- Silvia Crafa, David Cunningham, Vijay Saraswat, Avraham Shinnar, and Olivier Tardieu. 2014. Semantics of (Resilient) X10. In Proceedings of the 28th European Conference on Object-Oriented Programming. 670--696. Google ScholarDigital Library
- David Cunningham, David Grove, Benjamin Herta, Arun Iyengar, Kiyokuni Kawachiya, Hiroki Murata, Vijay Saraswat, Mikio Takeuchi, and Olivier Tardieu. 2014. Resilient X10: Efficient failure-aware programming. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’14). ACM, 67--80. Google ScholarDigital Library
- Doug Cutting and Eric Baldeschwieler. 2007. Meet Hadoop. In Proceedings of the O’Reilly Open Software Convention.Google Scholar
- Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Conference on Symposium on Operating Systems Design 8 Implementation (OSDI’04). 10--10. Google ScholarDigital Library
- E. N. Elnozahy, Lorenzo Alvisi, Yi-Min Wang, and David B. Johnson. 2002. A survey of rollback-recovery protocols in message-passing systems. ACM Comput. Survey 34, 3 (2002), 375--408. Google ScholarDigital Library
- Claudia Fohry and Marco Bungart. 2016. A robust fault tolerance scheme for lifeline-based taskpools. In Proceedings of the 45th International Conference on Parallel Processing Workshops (ICPPW’16). 200--209.Google ScholarCross Ref
- Claudia Fohry, Marco Bungart, and Jonas Posner. 2015. Towards an efficient fault-tolerance scheme for GLB. In Proceedings of the ACM SIGPLAN Workshop on X10 (X10’15). ACM, New York, NY, 27--32. Google ScholarDigital Library
- Amina Guermouche, Thomas Ropars, Elisabeth Brunet, Marc Snir, and Franck Cappello. 2011. Uncoordinated checkpointing without domino effect for send-deterministic MPI applications. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’11). 989--1000. Google ScholarDigital Library
- Sara S. Hamouda, Benjamin Herta, Josh Milthorpe, David Grove, and Olivier Tardieu. 2016. Resilient X10 over MPI User Level Failure Mitigation. In Proceedings of the ACM SIGPLAN Workshop on X10 (X10’16). Google ScholarDigital Library
- Sara S. Hamouda, Josh Milthorpe, Peter E. Strazdins, and Vijay Saraswat. 2015. A resilient framework for iterative linear algebra applications in X10. In Proceedings of the 16th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC’15). Google ScholarDigital Library
- Hazelcast, Inc. 2014. Hazelcast 3.4. Retrieved from https://hazelcast.com/.Google Scholar
- Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. 2010. ZooKeeper: Wait-free coordination for internet-scale systems. In Proceedings of the USENIX Annual Technical Conference. 11--11. Google ScholarDigital Library
- Maja Kabiljo, Dionysis Logothetis, Sergey Edunov, and Avery Ching. 2016. A Comparison of State-of-the-Art Graph Processing Systems. Technical Report. Facebook. Retrieved from https://code.facebook.com/posts/319004238457019/a-comparison-of-state-of-the-art-graph-processing-systems/.Google Scholar
- Laxmikant V. Kalé, Anshu Arya, Abhinav Bhatele, Abhishek Gupta, Nikhil Jain, Pritish Jetley, Jonathan Lifflander, Phil Miller, Yanhua Sun, Ramprasad Venkataraman, Lukasz Wesolowski, and Gengbin Zheng. 2011. Charm++ for Productivity and Performance: A Submission to the 2011 HPC Class II Challenge. Technical Report. Parallel Programming Laboratory.Google Scholar
- Ian Karlin, Jeff Keasler, and Rob Neely. 2013. LULESH 2.0 Updates and Changes. Technical Report LLNL-TR-641973.Google Scholar
- Vivek Kumar, Yili Zheng, Vincent Cavé, Zoran Budimlić, and Vivek Sarkar. 2014. HabaneroUPC++: A compiler-free PGAS library. In Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models. Article 5. Google ScholarDigital Library
- Jonathan Lifflander, Esteban Meneses, Harshitha Menon, Phil Miller, Sriram Krishnamoorthy, and Laxmikant V. Kalé. 2014. Scalable replay with partial-order dependencies for message-logging fault tolerance. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER’14). IEEE, Madrid, Spain, 19--28.Google Scholar
- Stuart Lloyd. 1982. Least squares quantization in PCM. IEEE Trans. Inf. Theor. 28, 2 (Mar. 1982), 129--137. Google ScholarDigital Library
- Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A framework for machine learning and data mining in the Cloud. Proc. VLDB Endow. 5, 8 (Apr. 2012), 716--727. Google ScholarDigital Library
- Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’10). 135--146. Google ScholarDigital Library
- Josh Milthorpe, David Grove, Benjamin Herta, and Olivier Tardieu. 2015. Exploring the APGAS Programming Model Using the LULESH Proxy Application. Technical Report RC25555. IBM Research.Google Scholar
- Stephen Olivier, Jun Huan, Jinze Liu, Jan Prins, James Dinan, P. Sadayappan, and Chau-Wen Tseng. 2007. UTS: An unbalanced tree search benchmark. In Proceedings of the 19th International Conference on Languages and Compilers for Parallel Computing (LCPC’06). Springer-Verlag, Berlin, 235--250. Google ScholarDigital Library
- Konstantina Panagiotopoulou and Hans-Wolfgang Loidl. 2016. Transparently resilient task parallelism for Chapel. In Proceedings of the International Parallel 8 Distributed Processing Symposium Workshops. IEEE, 1586--1595.Google ScholarCross Ref
- John T. Richards, Jonathan Brezin, Calvin B. Swart, and Christine A. Halverson. 2014. A decade of progress in parallel programming productivity. Commun. ACM 57, 11 (Oct. 2014), 60--66. Google ScholarDigital Library
- Martin Rinard. 2006. Probabilistic accuracy bounds for fault-tolerant computations that discard tasks. In Proceedings of the 20th Annual International Conference on Supercomputing (ICS’06). 324--334. Google ScholarDigital Library
- Vijay Saraswat, Gheorghe Almasi, Ganesh Bikshandi, Calin Cascaval, David Cunningham, David Grove, Sreedhar Kodali, Igor Peshansky, and Olivier Tardieu. 2010. The asynchronous partitioned global address space model. In Proceedings of the 1st Workshop on Advances in Message Passing (AMP’10).Google Scholar
- Vijay A. Saraswat, Prabhanjan Kambadur, Sreedhar Kodali, David Grove, and Sriram Krishnamoorthy. 2011. Lifeline-based global load balancing. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP’11). 201--212. Google ScholarDigital Library
- Kento Sato, Naoya Maruyama, Kathryn Mohror, Adam Moody, Todd Gamblin, Bronis R. de Supinski, and Satoshi Matsuoka. 2012. Design and modeling of a non-blocking checkpointing system. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis 2012 (SC’12). Google ScholarDigital Library
- Richard D. Schlichting and Fred B. Schneider. 1983. Fail-stop processors: An approach to designing fault-tolerant computing systems. ACM Trans. Comput. Syst. 1, 3 (Aug. 1983), 222--238. Google ScholarDigital Library
- Avraham Shinnar, David Cunningham, Benjamin Herta, and Vijay Saraswat. 2012. M3R: Increased performance for in-memory Hadoop jobs. In Proceedings of the VLDB Endowment (VLDB’12). Google ScholarDigital Library
- Olivier Tardieu, Benjamin Herta, David Cunningham, David Grove, Prabhanjan Kambadur, Vijay Saraswat, Avraham Shinnar, Mikio Takeuchi, and Mandana Vaziri. 2014. X10 and APGAS at Petascale. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice Of Parallel Programming (PPoPP’14). ACM, 53--66. Google ScholarDigital Library
- The X10 Language 2019. Git Repository. Retrieved from [email protected]:x10-lang/x10.git.Google Scholar
- Vinod Kumar Vavilapalli, Arun C. Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, Bikas Saha, Carlo Curino, Owen O’Malley, Sanjay Radia, Benjamin Reed, and Eric Baldeschwieler. 2013. Apache Hadoop YARN: Yet another resource negotiator. In Proceedings of the 4th Annual Symposium on Cloud Computing (SOCC’13). Article 5, 16 pages.Google ScholarDigital Library
- Steve Vinoski. 2007. Reliability with Erlang. IEEE Internet Comput. 11, 6 (2007), 79--81. Google ScholarDigital Library
- Tom White. 2009. Hadoop: The Definitive Guide (1st ed.). O’Reilly Media. Google ScholarDigital Library
- X10 Applications 2019. Git Repository. Retrieved from [email protected]:x10-lang/x10-applications.git.Google Scholar
- X10 Benchmarks 2019. Git Repository. Retrieved from [email protected]:x10-lang/x10-benchmarks.git.Google Scholar
- X10 v2.6.1. 2017. X10 2.6.1 Release. Retrieved fromGoogle Scholar
- Reynold S. Xin, Daniel Crankshaw, Ankur Dave, Joseph E. Gonzalez, Michael J. Franklin, and Ion Stoica. 2014. GraphX: Unifying data-parallel and graph-parallel analytics. arXiv preprint arXiv:1402.2394.Google Scholar
- Chaoran Yang, Karthik Murthy, and John Mellor-Crummey. 2013. Managing asynchronous operations in Coarray Fortran 2.0. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’13). 1321--1332. Google ScholarDigital Library
- John W. Young. 1974. A first-order approximation to the optimum checkpoint interval. Commun. ACM 17, 9 (1974), 530--531. Google ScholarDigital Library
- Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI’12). USENIX Association, 15--28. Google ScholarDigital Library
- Wei Zhang, Olivier Tardieu, David Grove, Benjamin Herta, Tomio Kamada, Vijay Saraswat, and Mikio Takeuchi. 2014. GLB: Lifeline-based global load balancing library in X10. In Proceedings of the 1st Workshop on Parallel Programming for Analytics Applications (PPAA’14). ACM, New York, NY, 31--40. Google ScholarDigital Library
- Gengbin Zheng, Xiang Ni, and Laxmikant V Kalé. 2012. A scalable double in-memory checkpoint and restart scheme towards exascale. In Proceedings of the IEEE/IFIP 42nd International Conference on Dependable Systems and Networks Workshops (DSN-W). IEEE, 1--6.Google ScholarCross Ref
- Yili Zheng, Amir Kamil, Michael B. Driscoll, Hongzhang Shan, and Katherine Yelick. 2014. UPC++: A PGAS extension for C++. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’14). 1105--1114. Google ScholarDigital Library
Index Terms
- Failure Recovery in Resilient X10
Recommendations
Resilient X10: efficient failure-aware programming
PPoPP '14: Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programmingScale-out programs run on multiple processes in a cluster. In scale-out systems, processes can fail. Computations using traditional libraries such as MPI fail when any component process fails. The advent of Map Reduce, Resilient Data Sets and MillWheel ...
Resilient X10: efficient failure-aware programming
PPoPP '14Scale-out programs run on multiple processes in a cluster. In scale-out systems, processes can fail. Computations using traditional libraries such as MPI fail when any component process fails. The advent of Map Reduce, Resilient Data Sets and MillWheel ...
Resilient X10 over MPI user level failure mitigation
X10 2016: Proceedings of the 6th ACM SIGPLAN Workshop on X10Many PGAS languages and libraries rely on high performance transport layers such as GASNet and MPI to achieve low communication latency, portability and scalability. As systems increase in scale, failures are expected to become normal events rather ...
Comments