Abstract
Advanced computer and network technologies have lead to the development of computer networks. Here, an application is realized by multiple processes located on multiple computers connected to a communication network such as the Internet. Each process computes and communicates with other processes by exchanging messages through communication channels. Mission- critical applications are required to be executed fauIt- tolerantly. That is, even if some processes fail, execution of an application is required to be continued. One of the important methods to realize fault-tolerant networks is checkpoint-recovery[2,4,6,7,10–12,16,19–21]. During failure-free execution, each process takes local checkpoints by storing state information into a stable storage [14]. If a certain process fails, the processes restart from the checkpoints by restoring the state information from the stable storage. For restarting execution of applications correctly in conventional data communication networks, a set of local checkpoints taken by all the processes and from which the processes restart should form a consistent global checkpoint [3]. A global checkpoint is defined to be consistent if there is neither orphan nor lost message. However, in a multimedia communication network, applications require transmission of large-size multimedia messages and low overhead failure-free execution rather than complete consistency. Hence, this paper proposes a novel criteria for consistent global checkpoints based on properties of multimedia communication networks and applications.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bernstein, P.A. and Goodman, N., “An Algorithm for Concurrency Control and Recovery in Replicated Distributed Databases,” ACM Trans. on Database Systems, Vol. 9, No. 4, pp. 1197–1207 (1984).
Bhargava, B. and Liao, S.R., “Independent Check-pointing and Concurrent Rollback for Recovery in Distributed Systems,” The 7th International Symposium on Reliable Distributed Steins, pp. 3–12 (1988).
Chaudy, K.M. and Lamport, L., “Distributed Snap shot: Determining Global States of Distributed Systems,” ACM Trams. on Computer Systems, Vol. 3, No. 1, pp. 63–75 (1985).
Cristiau, F. and Jahaiiai, F., “A Timestamp-Based Checkpointing Protocol for Long Lived Distributed Computations,” Reliable Distributed Software and Database Systems, pp. 12–20 (1991).
Douglas, E.C., “Internetworking with TCP/IP,” Prentice-Hall (1991).
Elozahy, E.N., Johnson, D.B. and Wang, Y.M., “A Survey of Rollback-Recovery Protocols in Message-Passing Systems,” Technical Note of Carnegie Mellon University, CMU-CS-96-181 (1996).
Elnozahy, E.N., Johnson, D.B. and Zwaenepoel, W., “The performance of consistent checkpointing,” The 11th International Symposium on Reliable Distributed Systems, pp. 39–47 (1992).
Giffrod, D.K., “Weighted Voting for Replication Data, ” The 7th ACM Symposium on Operating Systems, pp. 150–162 (1979).
Higaki, H., Nemoto, N., Tanaka, K. and Takizawa, M., “Protocol for Groups of Pseudo-Active Replication Objects,” International Workshop on Object Oriented Realtime Distributed Systems, pp. 35–41 (1999).
Juang, T.T.Y. and Venkatesan, S., “Efficient Algorithms for Crash Recovery in Distributed Systems,” The 10th Conference on Foundations of Software Technology and Theoretical Computer Science, pp. 349–361 (1990).
Johnson, D.B., “Efficient Transparent Optimistic Rollback Recovery for Distributed Application Programs,” The 12th International Symposium on Reliable Distributed Steins, pp. 86–95 (1993).
Koo, R. and Toueg, S., “Checkpointing and Rollback-Recovery for Distributed Systems,” IEEE Trans. on Software Engineering, Vol. SE-13, No. 1, pp. 23–31 (1987).
Kumar, A., “Hierarchical Quorum Consensus: A New Algorithm For Mamagiug Replicated Data,” IEEE Trans. on Computers, Vol. 40, No. 9, pp. 996–1004 (1991).
Lampsou, B.W., Paul, M. and Siegert, H.J., “Distributed Systems-Architecture and Implementation,” Springer-Verlag, pp. 246–265 (1981).
Mathew, E. H. and Russell, M. S., “MULTIMEDIA COMPUTING-Case Studies from MIT Project Athena,” Addison-Wesley (1993).
Paukaj, J., “Fault Tolerance in Distributed Systems,” Prentice Hall, pp.185–213 (1994).
Pu, C.A., Noe, D.D. and Proudfoot, A., “Regeneration of Replicated objects: A Technique and its Eden Implementation,” IEEE Trans. on Software Engineering, Vol. 14, No. 7, pp. 936–945 (1988).
Shimamura, K., Tanaka., K. and Takizawa, M., “Group Protocol for Exchanging Multimedia Objects in a Group,” 2000 ICDCS Workshop on Group Computation and Comninunications, pp. 33–40 (2000).
Silva, L.M. and Silva, J.G., “Global Checkpointing for Distributed Programs,” The 11th International Symposium on Reliable Distributed Systems, pp. 155–162 (1992).
Venkatesh, K., Radhakrishnan, T. and Li, H.F., “Optimal and Local Recording for Domino-Free Rollback Recovery,” Information Processing Letters, Vol. 25, pp. 295–303 (1987).
Wood, W.G., “A Decentralized Recovery Protocol,” The 11th International Symposium on Fault Tolerant Computing Systems, pp. 159–164 (1981).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Osada, S., Higaki, 1. (2001). QOS-based Checkpoint ProtOcOl for Multimedia Network Systems. In: Shum, HY., Liao, M., Chang, SF. (eds) Advances in Multimedia Information Processing — PCM 2001. PCM 2001. Lecture Notes in Computer Science, vol 2195. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45453-5_74
Download citation
DOI: https://doi.org/10.1007/3-540-45453-5_74
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42680-6
Online ISBN: 978-3-540-45453-3
eBook Packages: Springer Book Archive