skip to main content
research-article

On-chip sensor networks for soft-error tolerant real-time multiprocessor systems-on-chip

Published: 06 March 2014 Publication History

Abstract

As transistor density continues to increase with the advent of nanotechnology, reliability issues raised by the more frequent appearance of soft errors are becoming critical for future embedded multiprocessor systems design. State-of-the-art techniques for soft error protections targeting multiprocessor systems result either high chip cost and area overhead or high performance degradation and energy consumption, and do not fulfill the increasing requirements for high performance and dependability. In this article we present a systematic approach, that is, the Sensor Networks-on-Chip (SENoC), to collaboratively and efficiently manage on-chip applications and overcome reliability threats to Multiprocessor Systems-on-Chip (MPSoC). A hardware-software collaborative approach is proposed to solve soft error problems: a hardware-based on-chip sensor network is built for soft error detection, and a software-based recovery mechanism is applied for soft error correction. A two-step scheduling scheme is presented for reliable application and chip management, combining an off-line static optimization stage for application performance maximization and an online lightweight dynamic adjustment stage to handle runtime variations and exceptions. This strategy introduces only trivial overhead on hardware design and much lower overhead on software control and execution, and hence performance degradation and energy consumption is greatly reduced. We build a cycle-accurate simulator using SystemC, and verify the effectiveness of our technique by comparing performance with related techniques on several real-world applications.

References

[1]
http://www.systemc.org.
[2]
http://www.synopsys.com.
[3]
http://www.cadence.com.
[4]
C. Bender, P. N. Sanda, P. Kudva, R. Mata, V. Pokala, R. Haraden, and M. Schallhorn. 2008. Soft-error resilience of the IBM power6 processor input/output subsystem. IBM J. Res. Dev. 52, 3, 285--292.
[5]
C. Constantinescu. 2005. Neutron ser characterization of microprocessors. In Proceedings of the 2005 International Conference on Dependable Systems and Networks (DSN'05), IEEE Computer Society, Washington, D.C., 754--759.
[6]
A Dutta. and N. A. Touba. 2007. Reliable network-on-chip using a low cost unequal error protection code. In Proceedings of the IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems, IEEE Computer Society, Washington, D.C., 3--11.
[7]
D. Ernst, N. S. Kim, S. Das, S. Pant, R. Rao, T. Pham, C. Ziesler, D. Blaauw, T. Austin, K. Flautner, and T. Mudge. 2003. Razor: A low-power pipeline based on circuit-level timing speculation. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 36), IEEE Computer Society, Washington, D.C., 7.
[8]
A. P. Frantz, M. Cassel, F. L. Kastensmidt, E. Cota, and L. Carro. 2007. Crosstalk- and seuaware networks on chips. IEEE Des. Test 24, 4, 340--350.
[9]
S. Ghosh, R. Melhem, and E. D. Moss. 1997. Fault-tolerance through scheduling of aperiodic tasks in hard real-time multiprocessor systems. IEEE Trans. Parallel Distrib. Syst. 8, 3, 272--284.
[10]
V. Izosimov, I. Polian, P. Pop, P. Eles, and Z. Peng. 2009. Analysis and optimization of fault-tolerant embedded systems with hardened processors. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'09), IEEE Computer Society, Washington, D.C., 682--687.
[11]
JEDEC Standard, JESD89. 2001. http://www.jedec.org.
[12]
W. Liu, Z. Gu, J. Xu, X. Wu, and Y. Ye. 2010. Satisfiability modulo graph theory for task mapping and scheduling on multiprocessor systems. IEEE Trans. Parallel Distrib. Syst.
[13]
W. Liu, J. Xu, X. Wu, Y. Ye, X. Wang, W. Zhang, M. Nikdast, and Z. Wang. 2011. A noc traffic suite based on real applications. In Proceedings of the VLSI 2011 IEEE Computer Society Annual Symposium (ISVLSI), IEEE, Computer Society, Washington, D.C., 66--71.
[14]
G. Manimaran and C. S. R. Murthy. 1998. A fault-tolerant dynamic scheduling algorithm for multiprocessor real-time systems and its analysis. IEEE Trans. Parallel Distrib. Syst. 9, 11, 1137--1152.
[15]
S. Manolache, P. Eles, and Z. Peng. 2005. Fault and energy-aware communication mapping with guaranteed latency for applications implemented on noc. In Proceedings of the 42nd Annual Design Automation Conference (DAC'05), ACM, New York, 266--269.
[16]
S. Mitra. 2008. Globally optimized robust systems to overcome scaled cmos reliability challenges. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'08), ACM, New York, 941--946.
[17]
S. Mitra, M. Zhang, S. Waqas, N. Seifert, B. Gill, and K. S. Kim. 2006. Combinational logic soft error correction. In Proceedings of the IEEE International Test Conference (ITC'06), IEEE Computer Society, Washington, D.C., 1--9.
[18]
R. Naseer, R. Z. Bhatti, and J. Draper. 2006. Analysis of soft error mitigation techniques for register files in IBM CU-08 90nm technology. In Proceedings of the 49th IEEE International Midwest Symposium on Circuits and Systems (MWSCAS'06), vol. 1, IEEE Computer Society, Washington, D.C., 515--519.
[19]
M. Nicolaidis. 1999. Time redundancy based soft-error tolerance to rescue nanometer technologies. In Proceedings of the 17th IEEE VLSI Test Symposium (VTS'99), IEEE Computer Society, Washington, D.C., 86.
[20]
M. Nicolaidis. 2005. Design for soft error mitigation. IEEE Trans. Device Materials Reliability 5, 3, 405--418.
[21]
B. Nicolescu, R. Velazco, M. Sonza-Reorda, M. Rebaudengo, and M. Violante. 2002. A software fault tolerance method for safety-critical systems: Effectiveness and drawbacks. In Proceedings of the 15th Symposium on Integrated Circuits and Systems Design (SBCCI'02), IEEE Computer Society, Washington, D.C., 101.
[22]
D. Park, C. Nicopoulos, J. Kim, N. Vijaykrishnan, and C. R. Das. 2006. Exploring fault-tolerant network-on-chip architectures. In Proceedings of the International Conference on Dependable Systems and Networks (DSN'06), IEEE Computer Society, Washington, D.C., 93--104.
[23]
A. Patooghy, M. Fazeli, and S. G. Miremadi. 2007. A low-power and seu-tolerant switch architecture for network on chips. In Proceedings of the 13th Pacific Rim International Symposium on Dependable Computing (PRDC'07), IEEE Computer Society, Washington, DC, 264--267.
[24]
D. K. Pradhan (Ed.). 1996. Fault-Tolerant Computer System Design. Prentice-Hall, Upper Saddle River, NJ.
[25]
M. Rebaudengo, M. S. Reorda, M. Torchiano, and M. Violante. 1999. Soft-error detection through software fault-tolerance techniques. In Proceedings of the 14th International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT'99), IEEE Computer Society, Washington, D.C., 210--218.
[26]
G. A. Reis, J. Chang, N. Vachharajani, R. Rangan, and D. I. August. 2005. Swift: Software implemented fault tolerance. In Proceedings of the International Symposium on Code Generation and Optimization (CGO'05), IEEE Computer Society, Washington, D.C., 243--254.
[27]
J. Rivers, M. Gupta, J. Shin, P. Kudva, and P. Bose. 2011. Error tolerance in server class processors. IEEE Trans. Comput.-Aided Des. Integrated Circuits Syst. 30, 7, 945--959.
[28]
T. Sakata, T. Hirotsu, H. Yamada, and T. Kataoka. 2007. A cost-effective dependable microcontroller architecture with instruction-level rollback for soft error recovery. In Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07), 256--265.
[29]
S. A. Seshia, W. Li, and S. Mitra. 2007. Verification-guided soft error resilience. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'07), EDA Consortium, 1442--1447.
[30]
J. C. Smolens, B. T. Gold, B. Falsafi, and J. C. Hoe. 2006. Reunion: Complexity-effective multicore redundancy. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'39), IEEE Computer Society, Washington, D.C., 223--234.
[31]
T. Tobita and H. Kasahara. 2002. A standard task graph set for fair evaluation of multiprocessor scheduling algorithms. J. Scheduling 5, 5, 379--394.
[32]
Y. Wang, J. Xu, S. Huang, W. Liu, and H. Yang. 2009. A case study of on-chip sensor network in multiprocessor system-on-chip. In Proceedings of the 2009 international conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'09), ACM, New York, 241--250.
[33]
X. Zhu and W. Qin. 2006. Prototyping a fault-tolerant multiprocessor soc with run-time fault recovery. In Proceedings of the 43rd Annual Design Automation Conference (DAC'06), ACM, New York, 53--56.

Cited By

View all
  • (2018)Task mapping and scheduling for network-on-chip based multi-core platform with transient faultsJournal of Systems Architecture10.1016/j.sysarc.2018.01.00283(34-56)Online publication date: Feb-2018
  • (2018)Supervised deep hashing for scalable face image retrievalPattern Recognition10.1016/j.patcog.2017.03.02875:C(25-32)Online publication date: 1-Mar-2018
  • (2016)Application Mapping and Scheduling for Network-on-Chip-Based Multiprocessor System-on-Chip With Fine-Grain Communication OptimizationIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2016.253535924:10(3027-3040)Online publication date: 1-Oct-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Journal on Emerging Technologies in Computing Systems
ACM Journal on Emerging Technologies in Computing Systems  Volume 10, Issue 2
February 2014
143 pages
ISSN:1550-4832
EISSN:1550-4840
DOI:10.1145/2590828
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 06 March 2014
Accepted: 01 November 2012
Revised: 01 April 2012
Received: 01 September 2011
Published in JETC Volume 10, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Sensor network
  2. networks-on-chip
  3. performance
  4. reliability
  5. soft error

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)2
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Task mapping and scheduling for network-on-chip based multi-core platform with transient faultsJournal of Systems Architecture10.1016/j.sysarc.2018.01.00283(34-56)Online publication date: Feb-2018
  • (2018)Supervised deep hashing for scalable face image retrievalPattern Recognition10.1016/j.patcog.2017.03.02875:C(25-32)Online publication date: 1-Mar-2018
  • (2016)Application Mapping and Scheduling for Network-on-Chip-Based Multiprocessor System-on-Chip With Fine-Grain Communication OptimizationIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2016.253535924:10(3027-3040)Online publication date: 1-Oct-2016
  • (2016)Bit selection via walks on graph for hash-based nearest neighbor searchNeurocomputing10.1016/j.neucom.2015.11.132213:C(137-146)Online publication date: 12-Nov-2016

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media