Skip to main content

A Fault-Tolerant, Dynamically Scheduled Pipeline Structure for Chip Multiprocessors

  • Conference paper
Computer Safety, Reliability, and Security (SAFECOMP 2011)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 6894))

Included in the following conference series:

  • 2508 Accesses

Abstract

This paper presents a dynamically scheduled pipeline structure for chip multiprocessors (CMPs). This technique exploits existing Simultaneous Multithreading (SMT), superscalar chip multiprocessors’ redundancy to provide low-overhead, and broad coverage of faults at the cost of performance degradation for processors. This pipeline structure operates in two modes: 1) high-performance and 2) highly-reliable. In high-performance mode, each core works as a real SMT, superscalar processor. Whereas, the main contribution of the highly-reliable mode is: 1) To enhance the reliability of the system without adding extra redundancy strictly for fault tolerance, 2) To detect both transient and permanent faults, and 3) To recover existing faults. The experimental results show that the diagnosis mechanism quickly and accurately diagnoses faults. The fault detection latency for this technique is equal to the pipeline length of the processor, while it provides high fault detection coverage. Moreover, the reliable processor can function quite capably in the presence of both transient and permanent faults, despite of not using redundancy beyond which is already available in a modern microprocessor. Also, in the highly-reliable mode, the static and dynamic power consumption is declined by 25% and 36%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Reinhardt, S.K., Mukherjee, S.S.: Transient-Fault Detection via Simultaneous Multithreading. In: The Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA 2000), Canada, pp. 25–36 (June 2000)

    Google Scholar 

  2. Gibson, D., Wood, D.A.: Forward flow: a Scalable Core for Power-Constrained CMPs. In: The Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA 2010), USA, pp. 1–12 (June 2010)

    Google Scholar 

  3. Bhattacharjee, A., Martonosi, M.: Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors. In: Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA 2009), USA, pp. 290–301 (June 2009)

    Google Scholar 

  4. Sanchez, D., Aragon, J.L., Garcia, J.M.: Extending SRT for Parallel Applications in Tiled-CMP Architecture. In: The Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2009), USA, pp. 1–8 (July 2009)

    Google Scholar 

  5. Prvulovic, M., Zhang, Z., Torrellas, J.: ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors. In: The Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA 2002), USA, pp. 111–122 (May 2002)

    Google Scholar 

  6. Aggrarwal, N., Smiths, J.E., Saluja, K.K., Jouppi, N.P., Ranganathan, P.: Implementing High Availability Memory with a Duplication Cache. In: The Proceedings of the 41st IEEE/ACM International Symposium on Microarchitecture (MICRO 2008), USA, pp. 71–82 (November 2008)

    Google Scholar 

  7. Zarandi, H.R., Miremadi, S.G.: A Highly Fault Detectable Cache Architecture for Dependable Computing. In: Heisel, M., Liggesmeyer, P., Wittmann, S. (eds.) SAFECOMP 2004. LNCS, vol. 3219, pp. 45–59. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  8. Vadlamani, R., Zhao, J., Burleson, W., Tessier, R.: Multicore Soft Error Rate Stabilization Using Adaptive Dual Modular Redundancy. In: The Proceedings of the Conference on Design, Automation and Test in Europe (DATE 2010), Germany, pp. 27–32 (March 2010)

    Google Scholar 

  9. Kumar, S., Hari, S., Li, M., Ramachandran, P., Choi, B., Adve, S.V.: mSWAT: Low-Cost Hardware Fault Detection and Diagnosis for Multicore Systems. In: The Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2009), USA, pp. 122–132 (December 2009)

    Google Scholar 

  10. Siegel, T.J., et al.: IBM’s S/390 G5 Microprocessor Design. IEEE Micro 19(2), 12–23 (1999)

    Article  Google Scholar 

  11. Compaq Computer Corporation, Data Integrity for Compaq Nonstop Himalaya Servers (1999), http://nonstop.compaq.com

  12. Bower, F.A., Sorin, D.J., Ozev, S.: Online Diagnosis of Hard Faults in Microprocessors. ACM Transactions on Architecture and Code Optimization (TACO) 4(2), article 8 (June 2007)

    Google Scholar 

  13. Srinivasan, J., Adve, S.V., Bose, P., Rivers, J.A.: Exploiting Structural Duplication for Lifetime Reliability Enhancement. In: The Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA 2005), USA, pp. 520–531 (June 2005)

    Google Scholar 

  14. Tullsen, D.M., et al.: Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor. In: The Proceedings of the 23rd Annual International Symposium on Computer Architecture (ISCA 1996), USA, pp. 191–202 (June 1996)

    Google Scholar 

  15. Eyerman, S., Eeckhout, L.: Probabilistic Job Symbiosis Modeling for SMT Processor Scheduling. In: The Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2010), USA, pp. 91–102 (March 2010)

    Google Scholar 

  16. Ramirez, T., Pajuelo, A., Santana, O.J., Valero, M.: Run ahead Threads to Improve SMT Performance. In: The Proceedings of the 14th International Symposium on High Performance Computer Architecture (HPCA 2008), UT, pp. 149–158 (February 2008)

    Google Scholar 

  17. Eyerman, S., Eeckhout, L.: Per-Thread Cycle Accounting. IEEE Micro 30(1), 71–80 (2010)

    Article  Google Scholar 

  18. Timor, A., Mendelson, A., Birk, Y., Suri, N.: Using Underutilize CPU Resources to Enhance Its Reliability. IEEE Transactions on Dependable and Secure Computing 7(1), 94–109 (2010)

    Article  Google Scholar 

  19. Gomaa, M.A., Vijaykumar, T.N.: Opportunistic Transient-Fault Detection. In: The Proceedings of the 32nd International Symposium on Computer Architecture (ISCA 2005), pp. 172–183 (June 2005)

    Google Scholar 

  20. Sato, T.: Exploiting Instruction Redundancy for Transient Fault Tolerance. In: The Proceedings of the 18th International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT 2003), USA, pp. 547–555 (November 2003)

    Google Scholar 

  21. Wells, P.M., Chakraborty, K., Sohi, G.S.: Mixed-Mode Multicore Reliability. In: The Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2009), USA, pp. 169–180 (March 2009)

    Google Scholar 

  22. Rotenburg, E.: AR-SMT a Microarchitectural Approach to Fault Tolerance in Microprocessors. In: The Proceedings of 29th Annual International Symposium on Fault-Tolerant Computing Systems (FTCS 1999), USA, pp. 84–91 (June 1999)

    Google Scholar 

  23. Vijaykumar, T.N., Pomeranz, I., Cheng, K.: Transient-Fault Recovery Using Simultaneous Multithreading. In: The Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA 2002), USA, pp. 87–98 (May 2002)

    Google Scholar 

  24. Mukherjee, S.S., Kontz, M., Reinhardt, S.K.: Detailed Design and Evaluation of Redundant Multithreading Alternatives. In: The Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA 2002), USA, pp. 99–110 (May 2002)

    Google Scholar 

  25. Aggarwal, N., Ranganathan, P., Jouppi, N.P., Smith, J.E.: Configurable Isolation: Building High Availability Systems with Commodity Multi-Core Processors. In: The Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), USA, pp. 340–347 (June 2007)

    Google Scholar 

  26. Ragel, R., Ambrose, A., Peddersen, J., Parameswaran, S.: RACE: A Rapid, Architectural Simulation and Synthesis Framework for Embedded Processors. In: Hinchey, M., Kleinjohann, B., Kleinjohann, L., Lindsay, P.A., Rammig, F.J., Timmis, J., Wolf, M. (eds.) DIPES 2010. IFIP AICT, vol. 329, pp. 137–144. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  27. Burger, D.A., Austin, T.M.: The SimpleScalar Tool Set, Version 2.0. Technical report #1342, University of Wisconsin-Madison, Computer Science Department (June 1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Aliee, H., Zarandi, H.R. (2011). A Fault-Tolerant, Dynamically Scheduled Pipeline Structure for Chip Multiprocessors. In: Flammini, F., Bologna, S., Vittorini, V. (eds) Computer Safety, Reliability, and Security. SAFECOMP 2011. Lecture Notes in Computer Science, vol 6894. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24270-0_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24270-0_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24269-4

  • Online ISBN: 978-3-642-24270-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics