Abstract
It is foreseen that technology trends will increase the transient and permanent fault rates in future processors. Thus providing reliability for both the applications running on personal computers and running on mission-critical systems is becoming an absolute necessity. A reliable system requires the inclusion of two key capabilities: 1) error detection and 2) error recovery mechanisms. Transactional Memory (TM) provides an ideal base for both error detection and error recovery. First, TM provides mechanisms to abort transactions in case of a conflict, thus they discard or undo all the tentative memory updates and restart the execution from the beginning of the transaction. Thus, a transaction’s start can be viewed as a locally checkpointed stable state which can be used for error recovery. Second, transactional semantics allows the error detection to be deferred until a transaction commits (or the value becomes externally visible), so that the cost of error detection can be reduced compared to traditional error detection schemes (in which error detection is conducted et every instruction [26]) while its efficiency can be increased.
In this chapter, we first explain the hardware faults and aspects of reliability schemes such as error detection and error recovery. Then, we explain the major requirements of reliability schemes and the similarities between these requirements and transactional memory basics. Finally, we present current research landscape for reliability schemes using transactional memory.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Adir, A., Goodman, D., Hershcovich, D., Hershkovitz, O., Hickerson, B., Holtz, K., Kadry, W., Koyfman, A., Ludden, J., Meissner, C., Nahir, A., Pratt, R.R., Schiffli, M., Onge, B., Thompto, B., Tsanko, E., Ziv, A.: Verification of transactional memory in power8. In: Proceedings of the The 51st Annual Design Automation Conference on Design Automation Conference, pp. 58:1–58:6 (2014)
Agarwal, R., Garg, P., Torrellas, J.: Rebound: scalable checkpointing for coherent shared memory. In: Proceedings of the 38th Annual International Symposium on Computer Architecture, ISCA 2011, pp. 153–164 (2011)
Franklin, M., et al.: Built-in Self-Testing of Random-Access Memories. IEEE Computer 23(10) (October1990)
Wells., P.M., et al.: Adapting to Intermittent Faults in Multicore Systems. In: Proceedings of the 13th ASPLOS, pp. 255–264 (2008)
Baumann, R.: Soft errors in advanced computer systems. IEEE Design and Test 22, 258–266 (2005)
Bidokhti, N.: SEU Concept to Reality (Allocation, Prediction, Mitigation). In: RAMS (2010)
Bieniusa, A., Fuhrmann, T.: Consistency in hindsight: A fully decentralized stm algorithm, pp. 1–12 (2010)
Bocchino, R.L., Adve, V.S., Chamberlain, B.L.: Software transactional memory for large scale clusters. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 247–258 (2008)
Carvalho, N., Romano, P., Rodrigues, L.: A generic framework for replicated software transactional memories. In: Proceedings of the Tenth IEEE International Symposium on Networking Computing and Applications, pp. 271–274 (2011)
Chen, D.: Local Rollback for Fault-Tolerance in Parallel Computing systems, United States Patent Application, 12/696780 (2011)
Constantinescu, C.: Trends and challenges in vlsi circuit reliability. IEEE Micro 23, 14–19 (2003)
Couceiro, M., Romano, P., Carvalho, N., Rodrigues, L.: D2stm: Dependable distributed software transactional memory. In: Proceedings of the 2009 15th IEEE Pacific Rim International Symposium on Dependable Computing, pp. 307–313 (2009)
Dhoke, A., Ravindran, B., Zhang, B.: On closed nesting and checkpointing in fault-tolerant distributed transactional memory. In: IEEE International Symposium on Parallel and Distributed Processing, pp. 41–52 (2013)
Fetzer, C., Felber, P.: Transactional memory for dependable embedded systems. In: 7th Workshop on Hot Topics in System Dependability (HotDep), pp. 223–227. IEEE (2011)
Gong, R., Dai, K., Wang, Z.: Transient Fault Recovery on Chip Multiprocessor based on Dual Core Redundancy and Context Saving. In: International Conference for Young Computer Scientists, pp. 148–153 (2008)
Hammond, L., Wong, V., Chen, M., Carlstrom, B.D., Davis, J.D., Hertzberg, B., Prabhu, M.K., Wijaya, H., Kozyrakis, C., Olukotun, K.: Transactional memory coherence and consistency. SIGARCH Computer Architecture News 32(2), 102 (2004)
Kotselidis, C., Ansari, M., Jarvis, K., Lujn, M., Kirkham, C., Watson, I.: Distm: A software transactional memory framework for clusters. In: Proceedings of the International Conference on Parallel Processing (ICPP), pp. 51–58 (2008)
Michalak, S.E., Harris, K.W., Hengartner, N.W., Takala, B.E., Wender, S.A.: Predicting the Number of Fatal Soft Errors in Los Alamos National Labratory’s ASC Q Computer. IEEE Transactions on Device and Materials Reliability 5, 329–335 (2005)
Moore, K., Bobba, J., Moravan, M., Hill, M., Wood, D.: LogTM: log-based transactional memory, vol. 12, pp. 254–265. Austin, Texas (2006)
Mukherjee, S.S., Kontz, M., Reinhardt, S.K.: Detailed Design and Evaluation of Redundant Multithreading Alternatives. In: Proceedings of the International Symposium on Computer Architecture, pp. 99–110 (2002)
Mukherjee, S.: Architecture Design for Soft Errors (2008)
Rashid, L., Pattabiraman, K., Gopalakrishnan, S.: Towards understanding the effects of intermittent hardware faults on programs. Dependable Systems and Networks Workshops, 101–106 (2010)
Riegel, T., Felber, P., Fetzer, C.: Composable error recovery with transactional memory. Bulletin of the European Association for Theoretical Computer Science (BEATCS) 99 (2009)
Romano, P., Rodrigues, L., Carvalho, N., Cachopo, J.: Cloud-tm: Harnessing the cloud with distributed transactional memories. SIGOPS Oper. Syst. Rev. 44(2), 1–6 (2010)
Sanchez, D., Cebrian, J.M., Garcia, J.M., Aragon, J.L.: Soft-error mitigation by means of decoupled transactional memory threads. Distributed Computing, 1–16 (2014)
Slegel, T.J.A.: IBM’s S/390 G5 Microprocessor Design. IEEE Micro 19, 12–23 (1999)
Tomić, S., Perfumo, C., Kulkarni, C., Armejach, A., Cristal, A., Unsal, O., Harris, T., Valero, M.: Eazyhtm: eager-lazy hardware transactional memory. In: Micro-42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, New York, NY, USA, pp. 145–155 (2009)
Wamhoff, J.-T., Schwalbe, M., Faqeh, R., Fetzer, C., Felber, P.: Transactional encoding for tolerating transient hardware errors. In: Higashino, T., Katayama, Y., Masuzawa, T., Potop-Butucaru, M., Yamashita, M. (eds.) SSS 2013. LNCS, vol. 8255, pp. 1–16. Springer, Heidelberg (2013)
Weaver, C., Emer, J., Mukherjee, S.S., Reinhardt, S.K.: Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor. In: Proceedings of the 31st Annual International Symposium on Computer Architecture, pp. 264–275 (2004)
Wood, A., Jardine, R., Bartlett, W.: Data integrity in HP NonStop servers. In: Workshop on SELSE (2006)
Yalcin, G., Unsal, O., Cristal, A.: FaulTM: Fault-Tolerance Using Hardware Transactional Memory. In: Design, Automation and Test in Europe DATE (2012)
Yalcin, G., Unsal, O., Cristal, A.: Fault Tolerance for Multi-Threaded Applications by Leveraging Hardware Transactional Memory. In: International Conference on Computing Frontiers (2013)
Yalcin, G., Unsal, O., Cristal, A., Hur, I., Valero, M.: FaulTM: Fault-Tolerance Using Hardware Transactional Memory. In: Workshop on Parallel Execution of Sequential Programs on Multi-Core Architecture PESPMA (2010)
Yalcin, G., Unsal, O.S., Cristal, A., Hur, I., Valero, M.: SymptomTM: Symptom-Based Error Detection and Recovery Using Hardware Transactional Memory. In: International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 199–200. IEEE (2011)
Yoo, R.M., Hughes, C.J., Lai, K., Rajwar, R.: Performance evaluation of intel transactional synchronization extensions for high-performance computing. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 19:1-19:11 (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Yalcin, G., Unsal, O. (2015). Transactional Memory for Reliability. In: Guerraoui, R., Romano, P. (eds) Transactional Memory. Foundations, Algorithms, Tools, and Applications. Lecture Notes in Computer Science, vol 8913. Springer, Cham. https://doi.org/10.1007/978-3-319-14720-8_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-14720-8_13
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14719-2
Online ISBN: 978-3-319-14720-8
eBook Packages: Computer ScienceComputer Science (R0)