Skip to main content
Log in

Fault-Tolerant File-I/O for Portable Checkpointing Systems

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The ftIO-system provides portable and fault-tolerant file-I/O by enhancing the functionality of the ANSI C file system without changing its application programmer interface and without depending on system-specific implementations of the standard file operations. The ftIO-system is an extension of the porch compiler and its runtime system. The porch compiler automatically generates code to save bookkeeping information about ftIO's transactional file operations in portable checkpoints. These portable checkpoints can be recovered on a binary incompatible architecture. We developed a new algorithm for supporting transactional file operations in ftIO. Rather than using the well-known two-phase commit protocol, this algorithm uses only a single bit of information and an atomic rename file operation to guarantee fault tolerance. In this paper, we describe our new ftIO algorithm, discuss design choices for ftIO, and provide experimental data of our ftIO prototype.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Peter M. Chen and David A. Patterson. Storage Performance--Metrics and Benchmarks. Proceedings of the IEEE 81, (8) August 1993.

  2. Peter M. Chen and David A. Patterson. A New Approach to I/O Performance Evaluation: Self-Scaling I/O Benchmarks, Predicted I/O Performance. ACM Transactions on Computer Systems, 12(4): 309–339, November 1994.

    Google Scholar 

  3. Jim Gray and Andreas Reuter. Transaction Processing: Concepts and Techniques Morgan Kaufmann, San Mateo CA, 1993

    Google Scholar 

  4. Christine Hofmeister. Dynamic Reconfiguration. PhD thesis, Computer Science Department, University of Maryland, College Park, 1993.

    Google Scholar 

  5. Yennun Huang and Chandra Kintala. Software Implemented Fault Tolerance: Technologies and Experience. In Digest of Papers--23rd International Symposium on Fault-Tolerant Computing, pages 2–9, Toulouse, France, June 1993. IEEE.

  6. Nathaniel A. Kushman. Performance Nonmonotonicities: A Case Study of the UltraSPARC Processor. Master's thesis, MIT Department of Electrical Engineering and Computer Science, June 1998.

  7. Michael Litzkow, Todd Tannenbaum, Jim Basney, and Miron Livny. Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System. Technical Report 1346, Computer Sciences Department, University of Wisconsin-Madison, April 1997.

  8. Igor B. Lyubashevskiy. Portable Fault-Tolerant File-I/O. Master's thesis, MIT Department of Electrical Engineering and Computer Science, June 1998.

  9. James S. Plank, Micah Beck, and Gerry Kingsley. Libckpt: Transparent Checkpointing under Unix. In USENIX Winter 1995 Technical Conference, pages 213–233, New Orleans, Louisiana, January 1995.

  10. Peter J. Plauger. The Standard C Library. Prentice Hall, Englewood Cliffs, 1992.

    Google Scholar 

  11. James T. Poole. Preliminary Survey of I/O Intensive Applications. Technical Report CCSF–38, Scalable I/O Initiative, Caltech Concurrent Supercomputing Facilities, Caltech, 1994.

    Google Scholar 

  12. Balkrishna Ramkumar and Volker Strumpen. Portable Checkpointing for Heterogeneous Architectures. IN Digest of Papers--27th International Symposium on Fault-Tolerant Computing, pages 58–67, Seattle, Washington, June 1997. IEEE Computer Society.

  13. Richard Rashid, Avadis Tevanian Jr., Michael Young, David Golub, Robert Baron David Black, William J. Bolosky, and Jonathan Chew. Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures. IEEE Transactions on Computers, 37(8): 896–908, August 1988.

    Google Scholar 

  14. Herbert Schildt. The Annotated ANSI C Standard McGraw—Hill, 1990.

  15. Peter W. Smith. The Possibilities and Limitations of Heterogeneous Process Migration. PhD thesis, Department of Computer Sience, University of British Columbia, October 1997. (http: //www.cs.ubc.ca/spider/psmith/tui.html)

  16. Volker Strumpen. Compiler Technology for Portable Checkpoints, submitted for publication (http: //theory.lcs.mit.edu/~strumpen/porch.ps.gz), 1998.

  17. Andrew S. Tanenbaum. Modern Operating Systems. Prentice-Hall, 1992.

  18. Yi-Min Wang, Pe-Yu Chung, Yennun Huang, and Elmootazbellah N. Elnozahy. Integrating Checkpointing with Transaction Processing. In Digest of Papers--27th International Symposium on Fault-Tolerant Computing, pages 304–308, Seattle, Washington, June 1997. IEEE.

  19. Matthew J. Zekauskas, Wayne A. Sawdon, and Brian N. Bershad. Software Write Detection for a Distributed Shared Memory. In 1st Symposium on Operating Systems Design and Implementation, pages 87–100, Montery, CA, November 1994. USENIX.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lyubashevskiy, I., Strumpen, V. Fault-Tolerant File-I/O for Portable Checkpointing Systems. The Journal of Supercomputing 16, 69–92 (2000). https://doi.org/10.1023/A:1008133513763

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008133513763

Navigation