Abstract
This paper presents a dynamically scheduled pipeline structure for chip multiprocessors (CMPs). This technique exploits existing Simultaneous Multithreading (SMT), superscalar chip multiprocessors’ redundancy to provide low-overhead, and broad coverage of faults at the cost of performance degradation for processors. This pipeline structure operates in two modes: 1) high-performance and 2) highly-reliable. In high-performance mode, each core works as a real SMT, superscalar processor. Whereas, the main contribution of the highly-reliable mode is: 1) To enhance the reliability of the system without adding extra redundancy strictly for fault tolerance, 2) To detect both transient and permanent faults, and 3) To recover existing faults. The experimental results show that the diagnosis mechanism quickly and accurately diagnoses faults. The fault detection latency for this technique is equal to the pipeline length of the processor, while it provides high fault detection coverage. Moreover, the reliable processor can function quite capably in the presence of both transient and permanent faults, despite of not using redundancy beyond which is already available in a modern microprocessor. Also, in the highly-reliable mode, the static and dynamic power consumption is declined by 25% and 36%, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Reinhardt, S.K., Mukherjee, S.S.: Transient-Fault Detection via Simultaneous Multithreading. In: The Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA 2000), Canada, pp. 25–36 (June 2000)
Gibson, D., Wood, D.A.: Forward flow: a Scalable Core for Power-Constrained CMPs. In: The Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA 2010), USA, pp. 1–12 (June 2010)
Bhattacharjee, A., Martonosi, M.: Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors. In: Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA 2009), USA, pp. 290–301 (June 2009)
Sanchez, D., Aragon, J.L., Garcia, J.M.: Extending SRT for Parallel Applications in Tiled-CMP Architecture. In: The Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2009), USA, pp. 1–8 (July 2009)
Prvulovic, M., Zhang, Z., Torrellas, J.: ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors. In: The Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA 2002), USA, pp. 111–122 (May 2002)
Aggrarwal, N., Smiths, J.E., Saluja, K.K., Jouppi, N.P., Ranganathan, P.: Implementing High Availability Memory with a Duplication Cache. In: The Proceedings of the 41st IEEE/ACM International Symposium on Microarchitecture (MICRO 2008), USA, pp. 71–82 (November 2008)
Zarandi, H.R., Miremadi, S.G.: A Highly Fault Detectable Cache Architecture for Dependable Computing. In: Heisel, M., Liggesmeyer, P., Wittmann, S. (eds.) SAFECOMP 2004. LNCS, vol. 3219, pp. 45–59. Springer, Heidelberg (2004)
Vadlamani, R., Zhao, J., Burleson, W., Tessier, R.: Multicore Soft Error Rate Stabilization Using Adaptive Dual Modular Redundancy. In: The Proceedings of the Conference on Design, Automation and Test in Europe (DATE 2010), Germany, pp. 27–32 (March 2010)
Kumar, S., Hari, S., Li, M., Ramachandran, P., Choi, B., Adve, S.V.: mSWAT: Low-Cost Hardware Fault Detection and Diagnosis for Multicore Systems. In: The Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2009), USA, pp. 122–132 (December 2009)
Siegel, T.J., et al.: IBM’s S/390 G5 Microprocessor Design. IEEE Micro 19(2), 12–23 (1999)
Compaq Computer Corporation, Data Integrity for Compaq Nonstop Himalaya Servers (1999), http://nonstop.compaq.com
Bower, F.A., Sorin, D.J., Ozev, S.: Online Diagnosis of Hard Faults in Microprocessors. ACM Transactions on Architecture and Code Optimization (TACO)Â 4(2), article 8 (June 2007)
Srinivasan, J., Adve, S.V., Bose, P., Rivers, J.A.: Exploiting Structural Duplication for Lifetime Reliability Enhancement. In: The Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA 2005), USA, pp. 520–531 (June 2005)
Tullsen, D.M., et al.: Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor. In: The Proceedings of the 23rd Annual International Symposium on Computer Architecture (ISCA 1996), USA, pp. 191–202 (June 1996)
Eyerman, S., Eeckhout, L.: Probabilistic Job Symbiosis Modeling for SMT Processor Scheduling. In: The Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2010), USA, pp. 91–102 (March 2010)
Ramirez, T., Pajuelo, A., Santana, O.J., Valero, M.: Run ahead Threads to Improve SMT Performance. In: The Proceedings of the 14th International Symposium on High Performance Computer Architecture (HPCA 2008), UT, pp. 149–158 (February 2008)
Eyerman, S., Eeckhout, L.: Per-Thread Cycle Accounting. IEEE Micro 30(1), 71–80 (2010)
Timor, A., Mendelson, A., Birk, Y., Suri, N.: Using Underutilize CPU Resources to Enhance Its Reliability. IEEE Transactions on Dependable and Secure Computing 7(1), 94–109 (2010)
Gomaa, M.A., Vijaykumar, T.N.: Opportunistic Transient-Fault Detection. In: The Proceedings of the 32nd International Symposium on Computer Architecture (ISCA 2005), pp. 172–183 (June 2005)
Sato, T.: Exploiting Instruction Redundancy for Transient Fault Tolerance. In: The Proceedings of the 18th International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT 2003), USA, pp. 547–555 (November 2003)
Wells, P.M., Chakraborty, K., Sohi, G.S.: Mixed-Mode Multicore Reliability. In: The Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2009), USA, pp. 169–180 (March 2009)
Rotenburg, E.: AR-SMT a Microarchitectural Approach to Fault Tolerance in Microprocessors. In: The Proceedings of 29th Annual International Symposium on Fault-Tolerant Computing Systems (FTCS 1999), USA, pp. 84–91 (June 1999)
Vijaykumar, T.N., Pomeranz, I., Cheng, K.: Transient-Fault Recovery Using Simultaneous Multithreading. In: The Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA 2002), USA, pp. 87–98 (May 2002)
Mukherjee, S.S., Kontz, M., Reinhardt, S.K.: Detailed Design and Evaluation of Redundant Multithreading Alternatives. In: The Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA 2002), USA, pp. 99–110 (May 2002)
Aggarwal, N., Ranganathan, P., Jouppi, N.P., Smith, J.E.: Configurable Isolation: Building High Availability Systems with Commodity Multi-Core Processors. In: The Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), USA, pp. 340–347 (June 2007)
Ragel, R., Ambrose, A., Peddersen, J., Parameswaran, S.: RACE: A Rapid, Architectural Simulation and Synthesis Framework for Embedded Processors. In: Hinchey, M., Kleinjohann, B., Kleinjohann, L., Lindsay, P.A., Rammig, F.J., Timmis, J., Wolf, M. (eds.) DIPES 2010. IFIP AICT, vol. 329, pp. 137–144. Springer, Heidelberg (2010)
Burger, D.A., Austin, T.M.: The SimpleScalar Tool Set, Version 2.0. Technical report #1342, University of Wisconsin-Madison, Computer Science Department (June 1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Aliee, H., Zarandi, H.R. (2011). A Fault-Tolerant, Dynamically Scheduled Pipeline Structure for Chip Multiprocessors. In: Flammini, F., Bologna, S., Vittorini, V. (eds) Computer Safety, Reliability, and Security. SAFECOMP 2011. Lecture Notes in Computer Science, vol 6894. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24270-0_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-24270-0_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24269-4
Online ISBN: 978-3-642-24270-0
eBook Packages: Computer ScienceComputer Science (R0)