Skip to main content
Log in

A Low Overhead Logging Scheme for Fast Recovery in Distributed Shared Memory Systems

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

This paper presents an efficient, writer-based logging scheme for recoverable distributed shared memory systems, in which logging of a data item is performed by its writer process, instead of every process that accesses the item logging it. Since the writer process maintains the log of data items, volatile storage can be used for logging. Only the readers' access information needs to be logged into the stable storage of the writer process to tolerate multiple failures. Moreover, to reduce the frequency of stable logging, only the data items accessed by multiple processes are logged with their access information when the items are invalidated, and also semantic-based optimization in logging is considered. Compared with the earlier schemes in which stable logging was performed whenever a new data item was accessed or written by a process, the size of the log and the logging frequency can be significantly reduced in the proposed scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. M. Ahamad, P. W. Hutto, and R. John. Implementing and programming causal distributed shared memory. In Proc. of the 10th Int'l Conf on Distributed Computing Systems, pp. 274–281, Jun. 1990.

  2. M. Ahamad, J. E. Burns, P. W. Hutto, and G. Neiger. Causal memory. In Proc. of the 11th Int'l Conf on Distributed Computing Systems, pp. 274–281, May 1991.

  3. R. E. Ahmed, R. C. Frazier, and P. N. Marinos. Cache-aided rollback error recovery carer algorithms for shared-memory multiprocessor systems. In Proc. of the 20th Symp. on Fault-Tolerant Computing, pp. 82–88, Jun. 1990.

  4. G. Cabillic, G. Muller, and I. Puaut. The performance of consistent checkpointing in distributed shared memory systems. In Proc. of the l4th Symp. on Reliable Distributed Systems, Sep. 1995.

  5. J. B. Carter, A. L. Cox, S. Dwarkadas, E. N. Elnozahy, D. B. Johnson, P. Keleher, S. Rodrigues, W. Yu, and W. Zwaenepoel. Network multicomputing using recoverable distributed shared memory. In Proc. of the IEEE Int'l Conf. CompCon'93, Feb. 1993.

  6. M. Chandy and L. Lamport. Distributed snapshot: Determining global states of distributed systems. ACM Trans. on Computer Systems, 3(1): 63–75, Feb. 1985.

    Google Scholar 

  7. M. Costa, P. Guedes, M. Sequeira, N. Neves, and M. Castro. Lightweight logging for lazy release consistent distributed shared memory. In Proc. of the USENIX 2nd Symp. on Operating Systems Design and Implementation, Oct. 1996.

  8. G. Janakiraman and Y. Tamir. Coordinated checkpointing-rollback error recovery for distributed shared memory multicomputers. In Proc. of the 13th Symp. on Reliable Distributed Systems, pp. 42–51, Oct. 1994.

  9. B. Janssens and W. K. Fuchs. Relaxing consistency in recoverable distributed shared memory. In Proc. of the 23rd Annual Int'l Symp. on Fault-Tolerant Computing, pp. 155–163, Jun. 1993.

  10. B. Janassens and W. K. Fuchs. Reducing interprocessor dependence in recoverable shared memory. In Proc. of the 13rd Symp. on Reliable Distributed Systems, pp. 34–41, Oct. 1994.

  11. S. Kanthadai and J. L. Welch. Implementation of recoverable distributed shared memory by logging writes. In Proc. of the 16th Int'l Conf. on Distributed Computing Systems, pp. 116–123, May 1996.

  12. P. Keleher. CVM: The coherent virtual machine. http: www.cs.umd.eduprojectscvm.

  13. A. Kermarrec, G. Cabillic, A. Gefflaut, C. Morin, and I. Puaut. A recoverable distributed shared memory integrating coherence and recoverability. In Proc. of the 25th Int'l Symp. on Fault-Tolerant Computing Systems, pp. 289–298, Jun. 1995.

  14. L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. on Computers, C-28(9): 690–691, Sep. 1979.

    Google Scholar 

  15. K. Li. Shared virtual memory on loosely coupled multiprocessors. Ph.D. thesis, Department of Computer Science, Yale University, Sep. 1986.

  16. B. Nitzberg and V. Lo. Distributed shared memory: A survey of issues and algorithms. IEEE Computer, Aug. 1991.

  17. B. Randell, P. A. Lee, and P. C. Treleaven. Reliability issues in computing system design. ACM Computing Surveys, 10(2): 123–165, Jun. 1978.

    Google Scholar 

  18. M. Raynal, A. Schiper, and S. Toueg. The causal ordering abstraction and a simple way to implement it. Information Processing Letters, 39(6): 343–350, 1991.

    Google Scholar 

  19. G. G. Richard III and M. Singhal. Using logging and asynchronous checkpointing to implement recoverable distributed shared memory. In Proc. of the 12th Symp. on Reliable Distributed Systems, pp. 58–67, Oct. 1993.

  20. R. D. Schlichting and F. B. Schneider. Fail-stop processors: An approach to designing fault-tolerant computing systems. ACM Trans. on Computer Systems, 1(3): 222–238, Aug. 1983.

    Google Scholar 

  21. M. Stumm and S. Zhou. Algorithms implementing distributed shared memory. IEEE Computer, 54–64, May 1990.

  22. M. Stumm and S. Zhou. Fault tolerant distributed shared memory. In Proc. of the 2nd IEEE Symp. on Parallel and Distributed Processing, pp. 719–724, Dec. 1990.

  23. G. Suri, B. Janssens, and W. K. Fuchs. Reduced overhead logging for rollback recovery in distributed shared memory. In Proc. of the 25th Annual Int'l Symp. on Fault-Tolerant Computing, Jun. 1995.

  24. V. O. Tam and M. Hsu. Fast recovery in distributed shared virtual memory systems. In Proc. of the 10th Int'l Conf on Distributed Computing Systems, pp. 38–45, May 1990.

  25. K. L. Wu and W. K. Fuchs. Recoverable distributed shared memory. IEEE Trans. on Computers, 39(4): 460–469, Apr. 1990.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Park, T., Yeom, H.Y. A Low Overhead Logging Scheme for Fast Recovery in Distributed Shared Memory Systems. The Journal of Supercomputing 15, 295–320 (2000). https://doi.org/10.1023/A:1008116511402

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008116511402

Navigation