Skip to main content
Log in

Distributed speculative execution for reliability and fault tolerance: an operational semantics

  • Published:
Distributed Computing Aims and scope Submit manuscript

Abstract

This paper examines the use of speculations, a form of distributed transactions, for improving the reliability and fault tolerance of distributed systems. A speculation is defined as a computation that is based on an assumption that is not validated before the computation is started. If the assumption is later found to be false, the computation is aborted and the state of the program is rolled back; if the assumption is found to be true, the results of the computation are committed. The primary difference between a speculation and a transaction is that a speculation is not isolated—for example, a speculative computation may send and receive messages, and it may modify shared objects. As a result, processes that share those objects may be absorbed into a speculation. We present a syntax, and an operational semantics in two forms. The first one is a speculative model, which takes full advantage of the speculative features. The second one is a nonspeculative, nondeterministic model, where aborts are treated as failures. We prove the equivalence of the two models, demonstrating that speculative execution is equivalent to failure-free computation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Ananian, C.S., Asanović, K., Kuszmaul, B.C., Leiserson, C.E., Lie, S.: Unbounded transactional memory. In: Proceedings of the 11th International Symposium on High-Performance Computer Architecture (HPCA’05), San Franscisco, California, pp. 316–327 (2005)

  2. Black, A.P., Cremet, V., Guerraoui, R., Odersky, M.: An equational theory for transactions. In: FST TCS 2003: Foundations of Software Technology and Theoretical Computer Science, pp. 38–49. Australian Computer Society, Inc., Queensland (2003)

  3. Bruni, R., Butler, M.J., Ferreira, C., Hoare, C.A.R., Melgratti, H.C., Montanari, U.: Comparing two approaches to compensable flow composition. In: Abadi, M., de Alfaro L. (eds.) CONCUR. Lecture Notes in Computer Science, vol. 3653, pp. 383–397. Springer, Heidelerg (2005)

  4. Bruni R., Melgratti H.C., Montanari U.: Nested commits for mobile calculi: Extending join. In: Lévy, J.J., Mayr, E.W., Mitchell, J.C.(eds) IFIP TCS, pp. 563–576. Kluwer, Dordercht (2004)

    Google Scholar 

  5. Busi, N., Zavattaro, G.: On the serializability of transactions in shared dataspaces with temporary data. In: SAC, pp. 359–366. ACM, New York (2002)

  6. Chang, F., Gibson, G.A.: Automatic i/o hint generation through speculative execution. In: OSDI ’99: Proceedings of the Third Symposium on Operating Systems Design and Implementation (1999)

  7. Chothia T., Duggan D.: Abstractions for fault-tolerant global computing. Theor. Comput. Sci. 322(3), 567–613 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  8. Damani, O.P., Garg, V.K.: How to recover efficiently and asynchronously when optimism fails. In: International Conference on Distributed Computing Systems, pp. 108–115 (1996)

  9. Garcia-Molina, H., Salem, K.: Sagas. In: SIGMOD ’87: Proceedings of the 1987 ACM SIGMOD international conference on Management of data, pp. 249–259. ACM Press, New York (1987). doi:10.1145/38713.38742

  10. Gray J., Reuter A.: Transaction Processing: Concepts and Techniques. Morgan Kaufmann, Menlo Park (1994)

    Google Scholar 

  11. Haines, N., Kindred, D., Morrisett, J.G., Nettles, S.M., Wing, J.M.: Composing first-class transactions. ACM Transactions on Programming Languages and Systems. Short Communication (1994)

  12. Harris, T., Fraser, K.: Language support for lightweight transactions. In: Object-Oriented Programming, Systems, Languages, and Applications, pp. 388–402 (2003)

  13. Herlihy, M.: A methodology for implementing highly concurrent data structures. In: PPOPP ’90: Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming, pp. 197–206. ACM Press, New York (1990). doi:10.1145/99163.99185

  14. Herlihy, M., Moss, J.E.B.: Transactional memory: Architectural support for lock-free data structures. In: Proceedings of the 20th Annual International Symposium on Computer Architecture, pp. 289–300 (1993)

  15. Hoare C.: Communicating Sequential Processes. Prentice Hall, New Jersey (1985)

    MATH  Google Scholar 

  16. Jefferson D.R.: Virtual time. ACM Trans. Program. Lang. Syst. 7(3), 404–425 (1985). doi:10.1145/3916.3988

    Article  MathSciNet  Google Scholar 

  17. Johnson, D.B., Zwaenepoel, W.: Recovery in distributed systems using asynchronous message logging and checkpointing. In: PODC, pp. 171–181 (1988)

  18. Lai, A.C., Falsafi, B.: Memory sharing predictor: the key to a speculative coherent dsm. In: Proceedings of the 26th annual international symposium on Computer architecture, pp. 172–183. IEEE Computer Society Press, New York (1999). doi:10.1145/300979.300994

  19. Litzkow, M., Tannenbaum, T., Basney, J., Livny, M.: Checkpoint and migration of unix processes in the condor distributed processing system. Tech. Rep. 1346. Computer Sciences Department, University of Wisconsin (1997)

  20. Marathe, V.J., Scherer III, W.N., Scott, M.L.: Adaptive software transactional memory. In: Proceedings of the 19th International Symposium on Distributed Computing, Cracow, Poland. Earlier but expanded version available as TR 868, University of Rochester Computer Science Dept., May 2005 (2005)

  21. Moss, E.B.: (1981) Nested transactions: An approach to reliable distributed computing. Tech. rep., Cambridge, MA, USA

  22. Neves, N., Castro, M., Guedes, P.: A checkpoint protocol for an entry consistent shared memory system. In: PODC, pp. 121–129 (1994)

  23. Nightingale, E.B., Chen, P.M., Flinn, J.: Speculative execution in a distributed file system. In: SOSP ’05: Proceedings of the twentieth ACM symposium on Operating systems principles, pp. 191–205. ACM Press, New York (2005). doi:10.1145/1095810.1095829

  24. Oplinger, J., et al.: Software and hardware for exploiting speculative parallelism with a multiprocessor. Tech. rep., Stanford, CA, USA (1997)

  25. Prinz, A., Thalheim, B.: Operational semantics of transactions. In: CRPITS’17: Proceedings of the Fourteenth Australasian database conference on Database technologies 2003, pp. 169–179. Australian Computer Society, Inc., Queensland (2003)

  26. Qin, F., Tucek, J., Sundaresan, J., Zhou, Y.: Rx: treating bugs as allergies—a safe method to survive software failures. In: SOSP ’05: Proceedings of the twentieth ACM symposium on Operating systems principles, pp. 235–248. ACM Press, New York (2005). doi:10.1145/1095810.1095833

  27. Rajwar, R., Bernstein, P.A.: Atomic transactional execution in hardware: A new high-performance abstraction for databases. In: Position paper for the 10th International Workshop on High Performance Transaction Systems (2003)

  28. Sistla, A.P., Welch, J.L.: Efficient distributed recovery using message logging. In: PODC, pp. 223–238 (1989)

  29. Strom, R., Yemini, S.: Optimistic recovery in distributed systems. ACM Trans. Comput. Syst. 3(3), 204–226 (1985). doi:10.1145/3959.3962

  30. Takahashi, T., Sumimoto, S., Hori, A., Harada, H., Ishikawa, Y.: Pm2: High performance communication middleware for heterogeneous network environments. In: Proceedings of the IEEE/ACM SC2000 Conference (2000)

  31. Ţăpuş, C., Smith, J.D., Hickey, J.: Kernel level speculative DSM. In: IEEE International Symposium on Cluster Computing and the Grid (CCGRID 2003), Tokyo, Japan (2003). http://www.cs.caltech.edu/~crt/publications/dsm2003.pdf . Workshop on Distributed Shared Memory (DSM)

  32. Thain, D., Livny, M.: The ethernet approach to grid computing. In: HPDC ’03: Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing (HPDC’03)

  33. Wende, M., Schoettner, M., Goeckelmann, R., Bindhammer, T., Schulthess, P.: Optimistic synchronization and transactional consistency. In: CCGRID ’02: Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid, p. 331. IEEE Computer Society, Washington (2002)

  34. Zhong, H., Nieh, J.: Crak: Linux checkpoint / restart as a kernel module. Tech. Rep. CUCS-014-01, Department of Computer Science, Columbia University (2002). http://www.ncl.cs.columbia.edu/research/migrate/crak.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cristian Ţăpuş.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ţăpuş, C., Hickey, J. Distributed speculative execution for reliability and fault tolerance: an operational semantics. Distrib. Comput. 21, 433–455 (2009). https://doi.org/10.1007/s00446-008-0073-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00446-008-0073-1

Keywords

Navigation