Self-Adapting Resource Escalation for Resilient Signal Processing Architectures

Imran, Naveed; DeMara, Ronald F.; Lee, Jooheung; Huang, Jian

doi:10.1007/s11265-013-0811-x

Self-Adapting Resource Escalation for Resilient Signal Processing Architectures

Published: 18 July 2013

Volume 77, pages 257–280, (2014)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Naveed Imran¹,
Ronald F. DeMara¹,
Jooheung Lee² &
…
Jian Huang³

453 Accesses
5 Citations
Explore all metrics

Abstract

To deal with susceptibility to aging and process variation in the deep submicron era, signal processing systems are sought to maintain quality and throughput requirements despite the vulnerabilities of the underlying computational devices. The Priority Using Resource Escalation (PURE) online resiliency approach is developed herein to maintain throughput quality based on the output Peak Signal-to-Noise Ratio (PSNR) or other health metric. PURE is evaluated using an H.263 video encoder and shown to maintain signal processing throughput despite hardware faults. Its performance is compared to two alternative reconfiguration algorithms which prioritize the optimization of the number of reconfiguration occurrences and the fault detection latency, respectively. For a typical benchmark video sequence, PURE is shown to maintain a PSNR baseline near 32dB. Compared to the alternatives, PURE maintains a PSNR within a difference of 4.02dB to 6.67dB from the fault-free baseline by escalating healthy resources to higher-priority signal processing functions. The diagnosability, reconfiguration latency, and resource overhead of each approach is analyzed. The results indicate the benefits of priority-aware resiliency over conventional redundancy in terms of fault-recovery, power consumption, and resource-area requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerating erasure coding by exploiting multiple repair paths in distributed storage systems

Article 12 April 2024

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Article 27 April 2021

A Modern Primer on Processing in Memory

Abbreviations

G(V,E):: An undirected graph, where V is the set of all nodes, E is the set of edges
C :: Connectivity matrix
C(t):: Connectivity C at time instant t
\(\mathbf {\Psi }\) :: Syndrome Matrix
\(\Phi \) :: Fitness State Vector
\(\hat {\Phi }\) :: Estimated Fitness State Vector
P :: Priority Vector
t(G):: Diagnosability of G
d(G):: Average degree of a node in G
\(V_{a}\) :: Set of active nodes
\(V_{s}\) :: Set of Reconfigurable Slack (RS) to diagnose the active nodes by comparison-based diagnosis
\(V_{h}\) :: Set of healthy nodes
\(V_{NMR}\) :: Set for N-Modular Redundancy checking
M :: Number of PRRs
N :: Total number of nodes
\(N_{a}\) :: Number of nodes in the datapath (i.e., \(|V_a|\))
\(N_{s}\) :: Number of Reconfigurable Slacks (i.e., \(|V_s|\))
\(N_{d}\) :: Number of defectives
r :: Testing arrangement instance (may involve multiple reconfigurations)
s :: Slack update instance (a slack is reconfigured with some function)
t :: Time instant
\(T_{recon}\) :: Reconfiguration Time
\(T_{eval}\) :: Evaluation window period
F :: Functions assignments vector
\(F^{*}\) :: Solution vector F after recovery
\(T_{d}\) :: Latency of fault detection
\(T_{diag}\) :: Latency of fault diagnosis
\(T_{rec}\) :: Recovery time
\(N_{r}\) :: Number of testing arrangement instances before the diagnosis completes
\(N_{sup}\) :: Number of slack updates

References

Cho, H., Leem, L., Mitra, S. (2012). ERSA: error resilient system architecture for probabilistic applications. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 31(4), 546–558.
Article Google Scholar
Karakonstantis, G., Mohapatra, D., Roy, K. (2012). Logic and memory design based on unequal error protection for voltage-scalable, robust and adaptive DSP systems. Journal of Signal Processing Systems (JSPS), 68, 415–431.
Article Google Scholar
Whatmough, P.N., Das, S., Bull, D.M., Darwazeh, I. Circuit-level timing error tolerance for low-power dsp filters and transforms. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 21(6), 989–999.
Mitra, S., & McCluskey, E.J. (2000). Which concurrent error detection scheme to choose? In International test conference pp 985–994.
Mitra, S., Huang, W.J., Saxena, N.R., Yu, S.Y., McCluskey, E.J. (2004). Reconfigurable architecture for autonomous self-repair. IEEE Design Test of Computers, 21(3), 228–240.
Article Google Scholar
Rao, W., Orailoglu, A., Karri, R. (2006). Nanofabric topologies and reconfiguration algorithms to support dynamically adaptive fault tolerance. In VLSI test symposium (p. 6).
Gmez-Pulido, J.A., Vega-Rodrguez, M.A., Snchez-Prez, J.M. (2012). High-speed reconfigurable parallel system to design good error correcting codes in communications. Journal of Signal Processing Systems, 66, 147–152.
Article Google Scholar
Kthiri, M., Loukil, H., Atitallah, A., Kadionik, P., Dallet, D., Masmoudi, N. (2012). FPGA architecture of the LDPS motion estimation for H.264/AVC video coding. Journal of Signal Processing Systems, 68, 273–285.
Article Google Scholar
Imran, N., & DeMara, R.F. (2011). A self-configuring TMR scheme utilizing discrepancy resolution. In International conference on reconfigurable computing and FPGAs (ReConFig) (pp. 398–403).
Rubin, R., & DeHon, A. (2009). Choose-your-own-adventure routing: lightweight load-time defect avoidance. In Proceedings of the ACM/SIGDA international symposium on FPGAs (pp. 23–32). New York: ACM. ISBN 978-1-60558-410-2.
Berg, M., Poivey, C., Petrick, D., Espinosa, D., Lesea, A., LaBel, K.A., Friendlich, M., Kim, H., Phan, A. (2008). Effectiveness of internal versus external SEU scrubbing mitigation strategies in a Xilinx FPGA: design, test, and analysis. IEEE Transactions on Nuclear Science, 55(4), 2259–2266.
Article Google Scholar
Stoica, A., Keymeulen, D., Zebulum, R., Mojarradi, M., Katkoori, S., Daud, T. (2007). Adaptive and evolvable analog electronics for space applications. In Proceedings of the 7th international conference on evolvable systems: from biology to hardware. ICES’07 (pp. 379–390). Berlin: Springer. ISBN 3-540-74625-0, 978-3-540-74625-6.
Pereira, M.M., Braun, L., Hubner, M., Becker, J., Carro, L. (2011). Run-time resource instantiation for fault tolerance in FPGAs. In NASA/ESA conference on adaptive hardware and systems (AHS) (pp. 88–95).
Siozios, K., & Soudris, D. (2012). Low-cost fault tolerant targeting FPGA devices. In NASA/ESA conference on adaptive hardware and systems (AHS), special session on dependability by reconfigurable hardware.
Palem, K.V., Chakrapani, L.N.B., Kedem, Z.M., Lingamneni, A., Muntimadugu, K.K. (2009). Sustaining moore’s law in embedded computing through probabilistic and approximate design: retrospects and prospects. In International conference on compilers, architecture, and synthesis for embedded systems. CASES (pp. 1–10). New York: ACM. ISBN 978-1-60558-626-7.
Chippa, V.K., Mohapatra, D., Raghunathan, A., Roy, K., Chakradhar, S.T. (2010). Scalable effort hardware design: Exploiting algorithmic resilience for energy efficiency. In 47th ACM/IEEE design automation conference (DAC) (pp. 555–560).
Mohapatra, D., Karakonstantis, G., Roy, K. (2009). Significance driven computation: a voltage-scalable, variation-aware, quality-tuning motion estimator. In 14th ACM/IEEE international symposium on low power electronics and design (ISLPED) (pp. 195–200). New York: ACM. ISBN 978-1-60558-684-7.
Abdallah, R.A., & Shanbhag, N.R. (2010). Minimum-energy operation via error resiliency. IEEE Embedded Systems Letters, 2(4), 115–118.
Article Google Scholar
Kim, E.P., & Shanbhag, N.R. (2012). Soft n-modular redundancy. IEEE Transactions on Computers, 61(3), 323–336.
Article MathSciNet Google Scholar
Narayanan, S., Varatkar, G.V., Jones, D.L., Shanbhag, N.R. (2010). Computation as estimation: a general framework for robustness and energy efficiency in SoCs. IEEE Transactions on Signal Processing, 58(8), 4416–4421.
Article MathSciNet Google Scholar
Varatkar, G.V., & Shanbhag, N.R. (2008). Error-resilient motion estimation architecture. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 16(10), 1399–1412.
Article Google Scholar
Greenwood, G.W. (2005). On the practicality of using intrinsic reconfiguration for fault recovery. IEEE Transactions on Evolutionary Computation, 9(4), 398–405.
Article Google Scholar
Laprie, J.-C. (1995). Dependable computing and fault tolerance: concepts and terminology. In 25th international symposium on fault-tolerant computing - highlights from twenty-five years.
Miroslaw, M. (1980). A comparison connection assignment for diagnosis of multiprocessor systems. In Proceedings of the 7th annual symposium on computer architecture (ISCA) (pp. 31–36). New York: ACM.
Preparata, F.P., Metze, G., Chien, R.T. (1967). On the connection assignment problem of diagnosable systems. IEEE Transactions on EC-Electronic Computers, 16(6), 848–854.
Article MATH Google Scholar
DeMara, R.F., Zhang, K., Sharma, C.A. (2011). Autonomic fault-handling and refurbishment using throughput-driven assessment. Applied Soft Computing, 11, 1588–1599.
Article Google Scholar
Carmichael, C. (2006). Xilinx application note: virtex series XAPP197 (v1.0.1), July 6, 2006.
Kastensmidt, F.L., Sterpone, L., Carro, L., Reorda, M.S. (2005). On the optimal design of triple modular redundancy logic for SRAM-based FPGAs. In Design, automation and test in Europe (pp. 1290–12952).
Sloan, J., Kesler, D., Kumar, R., Rahimi, A. (2010). A numerical optimization-based methodology for application robustification: transforming applications for error tolerance. In IEEE/IFIP international conference on dependable systems and networks (DSN) (pp. 161–170).
Gao, M., Chang, H.-M.S., Lisherness, P., Cheng, K.-T.T. (2011). Time-multiplexed online checking. IEEE Transactions on Computers, 60(9), 1300–1312.
Article MathSciNet Google Scholar
Stott, E., Sedcole, P., Cheung, P. (2008). Fault tolerant methods for reliability in FPGAs. In International conference on field programmable logic and applications (FPL) (pp. 415–420).
Garvie, M., & Thompson, A. (2004). Scrubbing away transients and jiggling around the permanent: long survival of FPGA systems through evolutionary self-repair. In IEEE international on-line testing symposium (IOLTS) (pp. 155–160).
Keymeulen, D., Zebulum, R.S., Jin, Y., Stoica, A. (2000). Fault-tolerant evolvable hardware using field-programmable transistor arrays. IEEE Transactions on Reliability, 49(3), 305–316.
Article Google Scholar
Emmert, J.M., Stroud, C.E., Abramovici, M. (2007). Online fault tolerance for FPGA logic blocks. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 15(2), 216–226.
Article Google Scholar
Dutt, S., Verma, V., Suthar, V. (2008). Built-in-self-test of FPGAs with provable diagnosabilities and high diagnostic coverage with application to online testing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 27(2), 309–326.
Article Google Scholar
Abramovici, M., Stroud, C.E., Emmert, J.M. (2004). Online BIST and BIST-based diagnosis of FPGA logic blocks. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 12(12), 1284–1294.
Google Scholar
Gericota, M.G., Alves, G.R., Silva, M.L., Ferreira, J.M. (2008). Reliability and availability in reconfigurable computing: a basis for a common solution. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 16(11), 1545–1558.
Article Google Scholar
Mizan, E., Amimeur, T., Jacome, M.F. Oct. Self-imposed temporal redundancy: an efficient technique to enhance the reliability of pipelined functional units. In 19th international symposium on computer architecture and high performance computing (pp. 45–53).
Paulsson, K., Hubner, M., Becker, J. (2006). Strategies to on-line failure recovery in self-adaptive systems based on dynamic and partial reconfiguration. In First NASA/ESA conference on adaptive hardware and systems (AHS).
Koren, I., & Su, S.Y.H. (1979). Reliability analysis of n-modular redundancy systems with intermittent and permanent faults. IEEE Transactions on C-Computers, c-28(7), 514–520.
Article MathSciNet Google Scholar
Dorfman, R. (1943). The detection of defective members of large populations. The Annals of Mathematical Statistics, 14(4), 436–440.
Article Google Scholar
Litvak, E., Tu, X.M., Pagano, M. (1994). Screening for the presence of a disease by pooling sera samples. Journal of the American Statistical Association, 89(426), 424–434.
Article MATH Google Scholar
Smith, C.A.B. (1947). The counterfeit coin problem. The Mathematical Gazette, 31(293), 31–39.
Article Google Scholar
Imran, N., Lee, J., DeMara, R.F. (2012). Fault demotion using reconfigurable slack (FaDReS). IEEE Transactions on Very Large Scale Integration (VLSI) Systems 21(7), 1364–1368.
Google Scholar
Wiegand, T., Schwarz, H., Joch, A., Kossentini, F., Sullivan, G.J. (2003). Rate-constrained coder control and comparison of video coding standards. IEEE Transactions on Circuits and Systems for Video Technology, 13(7), 688–703.
Article Google Scholar
Karakonstantis, G., Mohapatra, D., Roy, K. (2009). System level DSP synthesis using voltage overscaling, unequal error protection & adaptive quality tuning. In IEEE workshop on signal processing systems (SiPS) (pp. 133–138).
Karakonstantis, G., Banerjee, N., Roy, K. (2010). Process-variation resilient and voltage-scalable DCT architecture for robust low-power computing. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 18(10), 1461–1470.
Article Google Scholar
Bucciero, M., Walters, J.P., French, M. (2011). Software fault tolerance methodology and testing for the embedded Power PC. In Proceedings of the IEEE aerospace conference (pp. 1–9). Big Sky, MT.
Imran, N., & DeMara, R.F. (2011). Heterogeneous concurrent error detection (hCED) based on output anticipation. In International conference on reconfigurable computing and FPGAs (ReConFig) (pp. 61–66).
Abramovici, M., Emmert, J.M., Stroud, C.E. (2001). Roving STARs: an integrated approach to on-line testing, diagnosis, and fault tolerance for FPGAs in adaptive computing systems. In The third NASA/DoD workshop on evolvable hardware (pp. 73–92).
Huang, W.-J., Mitra, S., McCluskey, E.J. (2001). Fast run-time fault location in dependable FPGA-based applications. In IEEE international symposium on defect and fault tolerance in vlsi systems (DFT) (pp. 206–214).
Xilinx (2012). Partial reconfiguration user guide UG702 (v14.3) October 16.
Becker, T., Luk, W., Cheung, P.Y.K. (2007). Enhancing relocatability of partial bitstreams for run-time reconfiguration. In 15th annual IEEE symposium on field-programmable custom computing machines (FCCM) (pp. 35–44).
Huang, J., & Lee, J. (2011). Reconfigurable architecture for ZQDCT using computational complexity prediction and bitstream relocation. IEEE Embedded Systems Letters, 3(1), 1–4.
Article Google Scholar
NIST. FIPS PUB 197, Advanced Encryption Standard (AES), National Institute of Standards and Technology, U.S. Department of Commerce, November 2001. http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf.
Drimer, S. (2009). Security for volatile FPGAs. Phd dissertation, University of Cambridge, 15 JJ Thomson Avenue Cambridge CB3 0FD United Kingdom.
Sharma, C.A., Sarvi, A., Alzahrani, A., DeMara, R.F. (2013). Self-healing reconfigurable logic using autonomous group testing. Microprocessors and Microsystems, 37(2), 174–184.
Article Google Scholar
Srinivasan, S., Krishnan, R., Mangalagiri, P., Xie, Y., Narayanan, V., Irwin, M.J., Sarpatwari, K. (2008). Toward increasing FPGA lifetime. IEEE Transactions on Dependable and Secure Computing, 5(2), 115–127.
Article Google Scholar
Poivey, C., Berg, M., Stansberry, S., Friendlich, M., Kim, H., Petrick, D., LaBel, K. (2007). Heavy ion SEE test of Virtex-4 FPGA XC4VFX60 from Xilinx. Retrieved on January 8, 2012 http://radhome.gsfc.nasa.gov/radhome/papers/T021607_XC4VFX60.pdf.
Bolchini, C., & Sandionigi, C. (2010). Fault classification for SRAM-Based FPGAs in the space environment for fault mitigation. IEEE Embedded Systems Letters, 2(4), 107–110.
Article Google Scholar
Trace. Video trace library: YUV 4:2:0 video sequences. Retrieved on January 20, 2012 http://trace.eas.asu.edu/yuv/.

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering and Computer Science, University of Central Florida, Orlando, FL, 32816, USA
Naveed Imran & Ronald F. DeMara
Department of Electronic and Electrical Engineering, Hongik University, Sejong, 339-701, Korea
Jooheung Lee
Advanced Micro Devices (AMD), Sunnyvale, CA, 94088, US
Jian Huang

Authors

Naveed Imran
View author publications
You can also search for this author in PubMed Google Scholar
Ronald F. DeMara
View author publications
You can also search for this author in PubMed Google Scholar
Jooheung Lee
View author publications
You can also search for this author in PubMed Google Scholar
Jian Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Naveed Imran.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Imran, N., DeMara, R.F., Lee, J. et al. Self-Adapting Resource Escalation for Resilient Signal Processing Architectures. J Sign Process Syst 77, 257–280 (2014). https://doi.org/10.1007/s11265-013-0811-x

Download citation

Received: 15 July 2012
Revised: 11 June 2013
Accepted: 18 June 2013
Published: 18 July 2013
Issue Date: December 2014
DOI: https://doi.org/10.1007/s11265-013-0811-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Self-Adapting Resource Escalation for Resilient Signal Processing Architectures

Abstract

Access this article

Similar content being viewed by others

Accelerating erasure coding by exploiting multiple repair paths in distributed storage systems

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

A Modern Primer on Processing in Memory

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Self-Adapting Resource Escalation for Resilient Signal Processing Architectures

Abstract

Access this article

Similar content being viewed by others

Accelerating erasure coding by exploiting multiple repair paths in distributed storage systems

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

A Modern Primer on Processing in Memory

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation