article

MOLAR: adaptive runtime support for high-end computing operating and runtime systems

Authors:

Christian Engelmann,

Stephen L. Scott,

David E. Bernholdt,

Narasimha R. Gottumukkala,

Chokchai Leangsuksun,

Jyothish Varma,

Aniruddha G. Shet,

P. SadayappanAuthors Info & Claims

ACM SIGOPS Operating Systems Review, Volume 40, Issue 2

Pages 63 - 72

https://doi.org/10.1145/1131322.1131337

Published: 01 April 2006 Publication History

Abstract

MOLAR is a multi-institutional research effort that concentrates on adaptive, reliable, and efficient operating and runtime system (OS/R) solutions for ultra-scale high-end scientific computing on the next generation of supercomputers. This research addresses the challenges outlined in FAST-OS (forum to address scalable technology for runtime and operating systems) and HECRTF (high-end computing revitalization task force) activities by exploring the use of advanced monitoring and adaptation to improve application performance and predictability of system interruptions, and by advancing computer reliability, availability and serviceability (RAS) management systems to work cooperatively with the OS/R to identify and preemptively resolve system issues. This paper describes recent research of the MOLAR team in advancing RAS for high-end computing OS/Rs.

References

[1]

G. Bosilca, Z. Chen, J. J. Dongarra, and J. Langou. Recovery patterns for iterative methods in a parallel unstable environment. Submitted to SIAM Journal on Scientific Computing, 2005.]]

[2]

Z. Chen, G. E. Fagg, E. Gabriel, J. Langou, T. Angskun, G. Bosilca, and J. J. Dongarra. Building fault survivable MPI programs with FT-MPI using diskless checkpointing. Proceedings of the Symposium on Principles and Practice of Parallel Programming (PPoPP), 2005.]]

[3]

D. E. Culler, R. M. Karp, D. A. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, and T. von Eicken. LogP: Towards a realistic model of parallel computation. Proceedings of 4th Symposion on Principles and Practice of Parallel Programming (PPoPP), pages 1--12, 1993.]]

Digital Library

[4]

X. Defago, A. Schiper, and P. Urban. Total order broadcast and multicast algorithms: Taxonomy and survey. ACM Computing Surveys, 36(4):372--421, 2004.]]

Digital Library

[5]

C. Engelmann and G. A. Geist. A diskless checkpointing algorithm for super-scale architectures applied to the fast fourier transform. Proceedings of Challenges of Large Applications in Distributed Environments (CLADE) Workshop, pages 47--52, 2003.]]

Digital Library

[6]

C. Engelmann and G. A. Geist. A lightweight kernel for the Harness metacomputing framework. Proceedings of 14th Heterogeneous Computing Workshop (HCW), pages 120--126, 2005.]]

Digital Library

[7]

C. Engelmann and G. A. Geist. Super-scalable algorithms for computing on 100,000 processors. Lecture Notes in Computer Science: Proceedings of International Conference on Computational Science (ICCS), 3514:313--320, 2005.]]

Digital Library

[8]

C. Engelmann and G. A. Geist. RMIX: A dynamic, heterogeneous, reconfigurable communication framework. Lecture Notes in Computer Science: Proceedings of International Conference on Computational Science (ICCS), 2006.]]

Digital Library

[9]

C. Engelmann and S. L. Scott. Concepts for high availability in scientific high-end computing. Proceedings of High Availability and Performance Computing Workshop (HAPCW), 2005.]]

[10]

C. Engelmann, S. L. Scott, and G. A. Geist. Distributed peer-to-peer control in Harness. Lecture Notes in Computer Science: Proceedings of International Conference on Computational Science (ICCS), 2330:720--728, 2002.]]

Digital Library

[11]

C. Engelmann, S. L. Scott, and G. A. Geist. High availability through distributed control. Proceedings of High Availability and Performance Computing Workshop (HAPCW), 2004.]]

[12]

C. Engelmann, S. L. Scott, C. Leangsuksun, and X. He. Active/active replication for highly available hpc system services. Proceedings of International Symposium on Frontiers in Availability, Reliability and Security (FARES), 2006.]]

Digital Library

[13]

G. E. Fagg, A. Bukovsky, and J. J. Dongarra. Harness and fault tolerant MPI. Parallel Computing, 27(11):1479--1495, 2001.]]

Digital Library

[14]

Forum to Address Scalable Technology for Runtime and Operating Systems. FAST-OS at http://www.fastos.org.]]

[15]

Forum to Address Scalable Technology for Runtime and Operating Systems (FAST-OS). MOLAR project at http://www.fastos.org/molar.]]

[16]

Fault Tolerant MPI (FT-MPI) Project at University of Tennessee, Knoxville, TN, USA. At http://icl.cs.utk.edu/ftmpi.]]

[17]

E. Gabriel, G. E. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra, J. M. Squyres, V. Sahay, P. Kambadur, B. Barrett, A. Lumsdaine, R. H. Castain, D. J. Daniel, R. L. Graham, and T. S. Woodall. Open MPI: Goals, concept, and design of a next generation MPI implementation. Proceedings of 11th European PVM/MPI Users' Group Meeting, 2004.]]

[18]

G. A. Geist, A. Beguelin, J. J. Dongarra, W. Jiang, R. Manchek, and V. S. Sunderam. PVM: Parallel Virtual Machine: A Users' Guide and Tutorial for Networked Parallel Computing. MIT Press, Cambridge, MA, USA, 1994.]]

Digital Library

[19]

G. A. Geist, J. A. Kohl, S. L. Scott, and P. M. Papadopoulos. HARNESS: Adaptable virtual machine environment for heterogeneous clusters. Parallel Processing Letters, 9(2):253--273, 1999.]]

[20]

N. R. Gottumukkala, C. Leangsuksun, and S. L. Scott. Reliability-aware approach to improve job completion time for large-scale parallel applications. Proceedings of 2nd Workshop on High Performance Computing Reliability Issues (HPCRI), 2006.]]

[21]

HA-OSCAR at Louisiana Tech University, Ruston, LA, USA. http://xcr.cenit.latech.edu/ha-oscar.]]

[22]

I. Haddad, C. Leangsuksun, and S. L. Scott. HA-OSCAR: Towards highly available linux clusters. Linux World Magazine, March 2004.]]

[23]

X. He, L. Ou, S. L. Scott, and C. Engelmann. A highly available cluster storage system using scavenging. Proceedings of High Availability and Performance Computing Workshop (HAPCW), 2004.]]

[24]

High-End Computing Revitalization Task Force. HECRTF at http://www.nitrd.gov/subcommittee/hec/hecrtf-outreach.]]

[25]

InfiniBand. http://www.infinibandta.org/home.]]

[26]

Lawrence Berkeley National Laboratory, Berkeley, CA, USA. Berkeley Lab Checkpoint Restart (BLCR) Project at http://ftg.lbl.gov/checkpoint.]]

[27]

Lawrence Livermore National Laboratory, Livermore, CA, USA. Trace logs at http://www.llnl.gov/asci/platforms/white.]]

[28]

C. Leangsuksun, V. K. Munganuru, T. Liu, S. L. Scott, and C. Engelmann. Asymmetric active-active high availability for high-end computing. Proceedings of 2nd International Workshop on Operating Systems, Programming Environments and Management Tools for High-Performance Computing on Clusters (COSET-2), 2005.]]

[29]

S. Moore, D. Cronk, K. London, and J. Dongarra. Review of Performance Analysis Tools for MPI Parallel Programs. Lecture Notes in Computer Science: 8th European PVM/MPI Users, 2331:241--248, September 2001.]]

Digital Library

[30]

L. Moser, Y. Amir, P. Melliar-Smith, and D. Agarwal. Extended virtual synchrony. Proceedings of 14th International Conference on Distributed Computing Systems (ICDCS), pages 56--65, 1994.]]

[31]

MPICH-V Project at University of Paris - South, France. http://www.Iri.fr/~gk/mpich-v.]]

[32]

MPICH2. http://www-unix.mcs.anl.gov/mpi/mpich2.]]

[33]

MVAPICH2, MPI over InfiniBand Project. http://nowlab.cse.ohio-state.edu/projects/mpi-iba.]]

[34]

J. Nieplocha, V. Tipparaju, M. Krishnan, G. Santhanaraman, and D. Panda. Optimizing Mechanisms for Latency Tolerance in Remote Memory Access Communication on Clusters. IEEE Cluster Computing 2003, December 2003.]]

[35]

Oak Ridge National Laboratory, TN, USA. Harness project at http://www.csm.ornl.gov/harness.]]

[36]

Open MPI Project. http://www.open-mpi.org.]]

[37]

OpenPBS resource manager at Altair Engineering, Troy, MI, USA. http://www.openpbs.org.]]

[38]

PVFS at Clemson University, Clemson, SC, USA. http://www.parl.clemson.edu/pvfs.]]

[39]

PVM Project at Oak Ridge National Laboratory. Oak Ridge, TN, USA. http://www.csm.ornl.gov/pvm.]]

[40]

R. I. Resnick. A modern taxonomy of high availability, 1996. http://www.generalconcepts.com/resources/reliability/resnick/HA.htm.]]

[41]

Science Case for Large-scale Simulation. SCaLeS at http://www.pnl.gov/scales.]]

[42]

A. G. Shet and P. Sadayappan. Performance Instrumentation to Characterize Computation-Communication Overlap in Message-Passing Systems. Technical Report OSU-CISRC-2/06-TR25, The Ohio State University, February 2006.]]

[43]

SLURM resource manager at Lawrence Livermore National Laboratory, Livermore, CA, USA. http://www.llnl.gov/linux/slurm.]]

[44]

TORQUE resource manager at Cluster Resources, Inc., Spanish Fork, UT, USA. http://www.clusterresources.com.]]

[45]

K. Uhlemann. High availability for ultra-scale high-end scientific computing. Master Thesis at the Department of Computer Science of the University of Reading, UK, March 2006.]]

[46]

J. B. White and S. W. Bova. Where's the Overlap? An Analysis of Popular MPI Implementations. Third MPI Developers' and Users' Conference, March 1999.]]

Cited By

Van Hensbergen EForsyth CMcKie JMinnich R(2008)Holistic aggregate resource environmentACM SIGOPS Operating Systems Review10.1145/1341312.134132742:1(85-91)Online publication date: 1-Jan-2008
https://dl.acm.org/doi/10.1145/1341312.1341327
Ou LEngelmann CHe XChen XScott S(2007)Symmetric active/active metadata service for highly available cluster storage systemsProceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems10.5555/1647539.1647613(404-409)Online publication date: 6-Nov-2007
https://dl.acm.org/doi/10.5555/1647539.1647613
Li Ou He XEngelmann CScott S(2007)A Fast Delivery Protocol for Total Order Broadcasting2007 16th International Conference on Computer Communications and Networks10.1109/ICCCN.2007.4317904(730-734)Online publication date: Aug-2007
https://doi.org/10.1109/ICCCN.2007.4317904
Show More Cited By

Index Terms

Recommendations

Reliability in grid computing systems
A Special Issue from the Open Grid Forum

In recent years, grid technology has emerged as an important tool for solving compute-intensive problems within the scientific community and in industry. To further the development and adoption of this technology, researchers and practitioners from ...
A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems
Abstract
In recent years, High Performance Computing (HPC) systems have been shifting from expensive massively parallel architectures to clusters of commodity PCs to take advantage of cost and performance benefits. Fault tolerance in such systems is a ...
Towards an immortal operating system in virtual environments
Highlights
- We show how a commercial OS can be successfully recovered from a crash.
- Support ...
Abstract
Many OS crashes are caused by bugs in kernel extensions or device drivers while the OS itself may have been tested rigorously. To make an OS immortal we must resurrect the OS from these crashes. We present a novel OS-hypervisor ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review

ACM SIGOPS Operating Systems Review Volume 40, Issue 2

April 2006

107 pages

ISSN:0163-5980

DOI:10.1145/1131322

Issue’s Table of Contents

Copyright © 2006 Authors.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 2006

Published in SIGOPS Volume 40, Issue 2

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
339
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Van Hensbergen EForsyth CMcKie JMinnich R(2008)Holistic aggregate resource environmentACM SIGOPS Operating Systems Review10.1145/1341312.134132742:1(85-91)Online publication date: 1-Jan-2008
https://dl.acm.org/doi/10.1145/1341312.1341327
Ou LEngelmann CHe XChen XScott S(2007)Symmetric active/active metadata service for highly available cluster storage systemsProceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems10.5555/1647539.1647613(404-409)Online publication date: 6-Nov-2007
https://dl.acm.org/doi/10.5555/1647539.1647613
Li Ou He XEngelmann CScott S(2007)A Fast Delivery Protocol for Total Order Broadcasting2007 16th International Conference on Computer Communications and Networks10.1109/ICCCN.2007.4317904(730-734)Online publication date: Aug-2007
https://doi.org/10.1109/ICCCN.2007.4317904
Gottumukkala NLeangsuksun CTaerat NNassar RScott S(2007)Reliability-aware resource allocation in HPC systemsProceedings of the 2007 IEEE International Conference on Cluster Computing10.1109/CLUSTR.2007.4629245(312-321)Online publication date: 17-Sep-2007
https://dl.acm.org/doi/10.1109/CLUSTR.2007.4629245
Engelmann CScott SLeangsuksun CHe X(2007)Transparent Symmetric Active/Active Replication for Service-Level High AvailabilityProceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid10.1109/CCGRID.2007.116(755-760)Online publication date: 14-May-2007
https://dl.acm.org/doi/10.1109/CCGRID.2007.116
Uhlemann KEngelmann CScott S(2006)JOSHUA: Symmetric Active/Active Replication for Highly Available HPC Job and Resource Management2006 IEEE International Conference on Cluster Computing10.1109/CLUSTR.2006.311855(1-10)Online publication date: Sep-2006
https://doi.org/10.1109/CLUSTR.2006.311855

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents