Skip to main content

Resource-Driven Optimizations for Transient-Fault Detecting SuperScalar Microarchitectures

  • Conference paper
Advances in Computer Systems Architecture (ACSAC 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3740))

Included in the following conference series:

Abstract

Increasing microprocessor vulnerability to soft errors induced by neutron and alpha particle strikes prevents aggressive scaling and integration of transistors in future technologies if left unaddressed. Previously proposed instruction-level redundant execution, as a means of detecting errors, suffers from a severe performance loss due to the resource shortage caused by the large number of redundant instructions injected into the superscalar core. In this paper, we propose to apply three architectural enhancements, namely 1) floating-point unit sharing (FUS), 2) prioritizing primary instructions (PRI), and 3) early retiring of redundant instructions (ERT), that enable transient-fault detecting redundant execution in superscalar microarchitectures with a much smaller performance penalty, while maintaining the original full coverage of soft errors. In addition, our enhancements are compatible with many other proposed techniques, allowing for further performance improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hp nonstop himalaya, http://nonstop.compaq.com/

  2. Austin, T.: Diva: A reliable substrate for deep submicron microarchitecture design. In: Proc. the 32nd Annual IEEE/ACM International Symposium on Microarchitecture, November 1999, pp. 196–207 (1999)

    Google Scholar 

  3. Burger, D., Austin, T.M.: The simplescalar tool set, version 2.0. Technical Report 1342, Computer Sciences Department, University of Wisconsin (1997)

    Google Scholar 

  4. Gomaa, M., Scarbrough, C., Vijaykumar, T., Pomeranz, I.: Transient-fault recovery for chip multiprocessors. In: Proc. the International Symposium on Computer Architecture, June 2003, pp. 98–109 (2003)

    Google Scholar 

  5. Hinton, G., Sager, D., Upton, M., Boggs, D.,, D.C.: The microarchitecture of the pentium 4 processor. Intel Technical Journal Q1 2001 Issue (February 2001)

    Google Scholar 

  6. Mendelson, A., Suri, N.: Designing high-performance and reliable superscalar architectures: The out of order reliable superscalar (o3rs) approach. In: Proc. of the International Conference on Dependable Systems and Networks (June 2000)

    Google Scholar 

  7. Mukherjee, S.S., Kontz, M., Reinhardt, S.K.: Detailed design and evaluation of redundant multithreading alternatives. In: Proc. the 29th Annual International Symposium on Computer Architecture, May 2002, pp. 99–110 (2002)

    Google Scholar 

  8. Namjoo, M., McCluskey, E.: Watchdog processors and detection of malfunctions at the system level. Technical Report 81-17, CRC (December 1981)

    Google Scholar 

  9. Parashar, A., Gurumurthi, S., Sivasubramaniam, A.: A complexity-effective approach to alu bandwidth enhancement for instruction-level temporal redundancy. In: Proc. the 31st Annual International Symposium on Computer Architecture (June 2004)

    Google Scholar 

  10. Ray, J., Hoe, J., Falsafi, B.: Dual use of superscalar datapath for transient-fault detection and recovery. In: Proc. the 34th Annual IEEE/ACM International Symposium on Microarchitecture, December 2001, pp. 214–224 (2001)

    Google Scholar 

  11. Reinhardt, S., Mukherjee, S.: Transient fault detection via simultaneous multithreading. In: Proc. the 27th Annual International Symposium on Computer Architecture, June 2000, pp. 25–36 (2000)

    Google Scholar 

  12. Rotenberg, E.: Ar-smt: A microarchitectural approach to fault tolerance in microprocessors. In: Proc. the International Symposium on Fault-Tolerant Computing, June 1999, pp. 84–91 (1999)

    Google Scholar 

  13. Sastry, S.S., Palacharla, S., Smith, J.E.: Exploiting idle floating point resources for integer execution. In: Proc. ACM SIGPLAN 1998 Conf. Programming Language Design and Implementation, June 1998, pp. 118–129 (1998)

    Google Scholar 

  14. Sherwood, T., Perelman, E., Hamerly, G., Calder, B.: Automatically characterizing large scale program behavior. In: The 10th International Conference on Architectural Support for Programming Languages and Operating Systems (October 2002)

    Google Scholar 

  15. Shivakumar, P., et al.: Modeling the effect of technology trends on the soft error rate of combinational logic. In: Proc. International Conference on Dependable Systems and Networks, June 2002, pp. 389–398 (2002)

    Google Scholar 

  16. Slegel, T.J., et al.: IBM’s S/390 G5 microprocessor design. IEEE Micro 19(2), 12–23 (1999)

    Article  Google Scholar 

  17. Smolens, J., Kim, J., Hoe, J.C., Falsafi, B.: Efficient resource sharing in concurrent error detecting superscalar microarchitecture. In: ACM/IEEE International Symposium on Microarchitecture (MICRO) (December 2004)

    Google Scholar 

  18. Sundaramoorthy, K., Purser, Z., Rotenburg, E.: Slipstream processors: Improving both performance and fault tolerance. In: Proc. the 9th International Conference on Architectural Support for Programming Languages and Operating systems, pp. 257–268 (2000)

    Google Scholar 

  19. Vijaykumar, T., Pomeranz, I., Cheng, K.: Transient-fault recovery via simultaneous multithreading. In: Proc. the 29th Annual International Symposium on Computer Architecture, May 2002, pp. 87–98 (2002)

    Google Scholar 

  20. Ziegler, J.F., et al.: IBM experiments in soft fails in computer electronics (1978 - 1994). IBM Journal of Research and Development 40(1), 3–18 (1996)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hu, J.S., Link, G.M., John, J.K., Wang, S., Ziavras, S.G. (2005). Resource-Driven Optimizations for Transient-Fault Detecting SuperScalar Microarchitectures. In: Srikanthan, T., Xue, J., Chang, CH. (eds) Advances in Computer Systems Architecture. ACSAC 2005. Lecture Notes in Computer Science, vol 3740. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11572961_17

Download citation

  • DOI: https://doi.org/10.1007/11572961_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29643-0

  • Online ISBN: 978-3-540-32108-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics