Abstract
This paper proposes and evaluates an integrated recoverable distributed shared memory (IRDSM) that integrates coherency control of distributed shared memory with control of checkpoint/recovery where a workstation cluster is used as a distributed environment. Copies on a distributed shared memory (DSM) system that allow multiple readers to access the same data simultaneously are used as replicas for recovery. This integration reduces data transfers for checkpoint/recovery. Replication of data without a copy is performed lazily because a future access to the data may make a copy and hide the overhead of replication of the data for recovery. The lazy replication also utilizes the differences between a copy and a replica in order to reduce data to be transmitted. An evaluation using programs of the SPLASH parallel benchmark suite is shown.
Preview
Unable to display preview. Download preview PDF.
References
Lamport, L., Shostak, R., Pease, M.: “The Byzantine generals problem,” ACM Trans. on Program. Lang. and Syst. Vol.4, No.3, pp.382–401, 1982.
Morin, C., Puaut, I.: “A Survey of Recoverable Distributed Shared Virtual Memory Systems,” IEEE Trans. on Parallel and Dist. Systems, No.9, Vol.8, pp.959–969, Sept. 1997.
Protic, J., Tomasevic, M., Milutinovic, V.: “Distributed Shared Memory: Concepts and Systems,” IEEE Parallel & Distributed Technology, Summer 1996, pp.63–79.
Kermarrec, A., Cabillic, G., Gefflaut, A., Morin, C., Puaut, I., “A Recoverable Distributed Shared Memory Integrating Coherence and Recoverability,” IEEE 25th Int'l Symp. on Fault-Tolerant Computing, pp.289–298, June 1995.
Li, K., Dudak, P.: “Memory Coherence in Shared Virtual Memory Systems,” ACM Trans. on Computer Systems, Vol.7, No.4, pp.321–359, Nov. 1989.
Singh, J.P., Weber, W., Gupta, A.: “SPLASH: Stanford Parallel Applications for Shared-Memory,” Computer Science Laboratory, Stanford University, CA, USA.
Wu, K., Fuchs, W.K.: “Recoverable Distributed Shared Virtual Memory,” IEEE Trans. on Computers, Vol.39, No.4, pp.460–469, April 1990.
Feeley, M.J., Chase, J.S., Narasayya, V.R., Levy, H.M.: “Integrating Coherency and Recovery in Distributed Systems,” First Symposium on Operating Systems Design and Implementation (OSDI), pp.215–227, Nov., 1994.
Carter, J.B. et at.: “Network Multicomputer Using Recoverable Distributed Shared Memory,” COMPCON Spring '93, pp. 63–75, 1993.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Osawa, N., Yuba, T. (1998). Lazy and differential replication in a recoverable distributed shared memory system. In: Sloot, P., Bubak, M., Hertzberger, B. (eds) High-Performance Computing and Networking. HPCN-Europe 1998. Lecture Notes in Computer Science, vol 1401. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0037197
Download citation
DOI: https://doi.org/10.1007/BFb0037197
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64443-9
Online ISBN: 978-3-540-69783-1
eBook Packages: Springer Book Archive