Skip to main content
Log in

Self-Adapting Resource Escalation for Resilient Signal Processing Architectures

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

To deal with susceptibility to aging and process variation in the deep submicron era, signal processing systems are sought to maintain quality and throughput requirements despite the vulnerabilities of the underlying computational devices. The Priority Using Resource Escalation (PURE) online resiliency approach is developed herein to maintain throughput quality based on the output Peak Signal-to-Noise Ratio (PSNR) or other health metric. PURE is evaluated using an H.263 video encoder and shown to maintain signal processing throughput despite hardware faults. Its performance is compared to two alternative reconfiguration algorithms which prioritize the optimization of the number of reconfiguration occurrences and the fault detection latency, respectively. For a typical benchmark video sequence, PURE is shown to maintain a PSNR baseline near 32dB. Compared to the alternatives, PURE maintains a PSNR within a difference of 4.02dB to 6.67dB from the fault-free baseline by escalating healthy resources to higher-priority signal processing functions. The diagnosability, reconfiguration latency, and resource overhead of each approach is analyzed. The results indicate the benefits of priority-aware resiliency over conventional redundancy in terms of fault-recovery, power consumption, and resource-area requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15

Similar content being viewed by others

Abbreviations

G(V,E):

An undirected graph, where V is the set of all nodes, E is the set of edges

C :

Connectivity matrix

C(t):

Connectivity C at time instant t

\(\mathbf {\Psi }\) :

Syndrome Matrix

\(\Phi \) :

Fitness State Vector

\(\hat {\Phi }\) :

Estimated Fitness State Vector

P :

Priority Vector

t(G):

Diagnosability of G

d(G):

Average degree of a node in G

\(V_{a}\) :

Set of active nodes

\(V_{s}\) :

Set of Reconfigurable Slack (RS) to diagnose the active nodes by comparison-based diagnosis

\(V_{h}\) :

Set of healthy nodes

\(V_{NMR}\) :

Set for N-Modular Redundancy checking

M :

Number of PRRs

N :

Total number of nodes

\(N_{a}\) :

Number of nodes in the datapath (i.e., \(|V_a|\))

\(N_{s}\) :

Number of Reconfigurable Slacks (i.e., \(|V_s|\))

\(N_{d}\) :

Number of defectives

r :

Testing arrangement instance (may involve multiple reconfigurations)

s :

Slack update instance (a slack is reconfigured with some function)

t :

Time instant

\(T_{recon}\) :

Reconfiguration Time

\(T_{eval}\) :

Evaluation window period

F :

Functions assignments vector

\(F^{*}\) :

Solution vector F after recovery

\(T_{d}\) :

Latency of fault detection

\(T_{diag}\) :

Latency of fault diagnosis

\(T_{rec}\) :

Recovery time

\(N_{r}\) :

Number of testing arrangement instances before the diagnosis completes

\(N_{sup}\) :

Number of slack updates

References

  1. Cho, H., Leem, L., Mitra, S. (2012). ERSA: error resilient system architecture for probabilistic applications. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 31(4), 546–558.

    Article  Google Scholar 

  2. Karakonstantis, G., Mohapatra, D., Roy, K. (2012). Logic and memory design based on unequal error protection for voltage-scalable, robust and adaptive DSP systems. Journal of Signal Processing Systems (JSPS), 68, 415–431.

    Article  Google Scholar 

  3. Whatmough, P.N., Das, S., Bull, D.M., Darwazeh, I. Circuit-level timing error tolerance for low-power dsp filters and transforms. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 21(6), 989–999.

  4. Mitra, S., & McCluskey, E.J. (2000). Which concurrent error detection scheme to choose? In International test conference pp 985–994.

  5. Mitra, S., Huang, W.J., Saxena, N.R., Yu, S.Y., McCluskey, E.J. (2004). Reconfigurable architecture for autonomous self-repair. IEEE Design Test of Computers, 21(3), 228–240.

    Article  Google Scholar 

  6. Rao, W., Orailoglu, A., Karri, R. (2006). Nanofabric topologies and reconfiguration algorithms to support dynamically adaptive fault tolerance. In VLSI test symposium (p. 6).

  7. Gmez-Pulido, J.A., Vega-Rodrguez, M.A., Snchez-Prez, J.M. (2012). High-speed reconfigurable parallel system to design good error correcting codes in communications. Journal of Signal Processing Systems, 66, 147–152.

    Article  Google Scholar 

  8. Kthiri, M., Loukil, H., Atitallah, A., Kadionik, P., Dallet, D., Masmoudi, N. (2012). FPGA architecture of the LDPS motion estimation for H.264/AVC video coding. Journal of Signal Processing Systems, 68, 273–285.

    Article  Google Scholar 

  9. Imran, N., & DeMara, R.F. (2011). A self-configuring TMR scheme utilizing discrepancy resolution. In International conference on reconfigurable computing and FPGAs (ReConFig) (pp. 398–403).

  10. Rubin, R., & DeHon, A. (2009). Choose-your-own-adventure routing: lightweight load-time defect avoidance. In Proceedings of the ACM/SIGDA international symposium on FPGAs (pp. 23–32). New York: ACM. ISBN 978-1-60558-410-2.

  11. Berg, M., Poivey, C., Petrick, D., Espinosa, D., Lesea, A., LaBel, K.A., Friendlich, M., Kim, H., Phan, A. (2008). Effectiveness of internal versus external SEU scrubbing mitigation strategies in a Xilinx FPGA: design, test, and analysis. IEEE Transactions on Nuclear Science, 55(4), 2259–2266.

    Article  Google Scholar 

  12. Stoica, A., Keymeulen, D., Zebulum, R., Mojarradi, M., Katkoori, S., Daud, T. (2007). Adaptive and evolvable analog electronics for space applications. In Proceedings of the 7th international conference on evolvable systems: from biology to hardware. ICES’07 (pp. 379–390). Berlin: Springer. ISBN 3-540-74625-0, 978-3-540-74625-6.

  13. Pereira, M.M., Braun, L., Hubner, M., Becker, J., Carro, L. (2011). Run-time resource instantiation for fault tolerance in FPGAs. In NASA/ESA conference on adaptive hardware and systems (AHS) (pp. 88–95).

  14. Siozios, K., & Soudris, D. (2012). Low-cost fault tolerant targeting FPGA devices. In NASA/ESA conference on adaptive hardware and systems (AHS), special session on dependability by reconfigurable hardware.

  15. Palem, K.V., Chakrapani, L.N.B., Kedem, Z.M., Lingamneni, A., Muntimadugu, K.K. (2009). Sustaining moore’s law in embedded computing through probabilistic and approximate design: retrospects and prospects. In International conference on compilers, architecture, and synthesis for embedded systems. CASES (pp. 1–10). New York: ACM. ISBN 978-1-60558-626-7.

  16. Chippa, V.K., Mohapatra, D., Raghunathan, A., Roy, K., Chakradhar, S.T. (2010). Scalable effort hardware design: Exploiting algorithmic resilience for energy efficiency. In 47th ACM/IEEE design automation conference (DAC) (pp. 555–560).

  17. Mohapatra, D., Karakonstantis, G., Roy, K. (2009). Significance driven computation: a voltage-scalable, variation-aware, quality-tuning motion estimator. In 14th ACM/IEEE international symposium on low power electronics and design (ISLPED) (pp. 195–200). New York: ACM. ISBN 978-1-60558-684-7.

  18. Abdallah, R.A., & Shanbhag, N.R. (2010). Minimum-energy operation via error resiliency. IEEE Embedded Systems Letters, 2(4), 115–118.

    Article  Google Scholar 

  19. Kim, E.P., & Shanbhag, N.R. (2012). Soft n-modular redundancy. IEEE Transactions on Computers, 61(3), 323–336.

    Article  MathSciNet  Google Scholar 

  20. Narayanan, S., Varatkar, G.V., Jones, D.L., Shanbhag, N.R. (2010). Computation as estimation: a general framework for robustness and energy efficiency in SoCs. IEEE Transactions on Signal Processing, 58(8), 4416–4421.

    Article  MathSciNet  Google Scholar 

  21. Varatkar, G.V., & Shanbhag, N.R. (2008). Error-resilient motion estimation architecture. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 16(10), 1399–1412.

    Article  Google Scholar 

  22. Greenwood, G.W. (2005). On the practicality of using intrinsic reconfiguration for fault recovery. IEEE Transactions on Evolutionary Computation, 9(4), 398–405.

    Article  Google Scholar 

  23. Laprie, J.-C. (1995). Dependable computing and fault tolerance: concepts and terminology. In 25th international symposium on fault-tolerant computing - highlights from twenty-five years.

  24. Miroslaw, M. (1980). A comparison connection assignment for diagnosis of multiprocessor systems. In Proceedings of the 7th annual symposium on computer architecture (ISCA) (pp. 31–36). New York: ACM.

  25. Preparata, F.P., Metze, G., Chien, R.T. (1967). On the connection assignment problem of diagnosable systems. IEEE Transactions on EC-Electronic Computers, 16(6), 848–854.

    Article  MATH  Google Scholar 

  26. DeMara, R.F., Zhang, K., Sharma, C.A. (2011). Autonomic fault-handling and refurbishment using throughput-driven assessment. Applied Soft Computing, 11, 1588–1599.

    Article  Google Scholar 

  27. Carmichael, C. (2006). Xilinx application note: virtex series XAPP197 (v1.0.1), July 6, 2006.

  28. Kastensmidt, F.L., Sterpone, L., Carro, L., Reorda, M.S. (2005). On the optimal design of triple modular redundancy logic for SRAM-based FPGAs. In Design, automation and test in Europe (pp. 1290–12952).

  29. Sloan, J., Kesler, D., Kumar, R., Rahimi, A. (2010). A numerical optimization-based methodology for application robustification: transforming applications for error tolerance. In IEEE/IFIP international conference on dependable systems and networks (DSN) (pp. 161–170).

  30. Gao, M., Chang, H.-M.S., Lisherness, P., Cheng, K.-T.T. (2011). Time-multiplexed online checking. IEEE Transactions on Computers, 60(9), 1300–1312.

    Article  MathSciNet  Google Scholar 

  31. Stott, E., Sedcole, P., Cheung, P. (2008). Fault tolerant methods for reliability in FPGAs. In International conference on field programmable logic and applications (FPL) (pp. 415–420).

  32. Garvie, M., & Thompson, A. (2004). Scrubbing away transients and jiggling around the permanent: long survival of FPGA systems through evolutionary self-repair. In IEEE international on-line testing symposium (IOLTS) (pp. 155–160).

  33. Keymeulen, D., Zebulum, R.S., Jin, Y., Stoica, A. (2000). Fault-tolerant evolvable hardware using field-programmable transistor arrays. IEEE Transactions on Reliability, 49(3), 305–316.

    Article  Google Scholar 

  34. Emmert, J.M., Stroud, C.E., Abramovici, M. (2007). Online fault tolerance for FPGA logic blocks. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 15(2), 216–226.

    Article  Google Scholar 

  35. Dutt, S., Verma, V., Suthar, V. (2008). Built-in-self-test of FPGAs with provable diagnosabilities and high diagnostic coverage with application to online testing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 27(2), 309–326.

    Article  Google Scholar 

  36. Abramovici, M., Stroud, C.E., Emmert, J.M. (2004). Online BIST and BIST-based diagnosis of FPGA logic blocks. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 12(12), 1284–1294.

    Google Scholar 

  37. Gericota, M.G., Alves, G.R., Silva, M.L., Ferreira, J.M. (2008). Reliability and availability in reconfigurable computing: a basis for a common solution. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 16(11), 1545–1558.

    Article  Google Scholar 

  38. Mizan, E., Amimeur, T., Jacome, M.F. Oct. Self-imposed temporal redundancy: an efficient technique to enhance the reliability of pipelined functional units. In 19th international symposium on computer architecture and high performance computing (pp. 45–53).

  39. Paulsson, K., Hubner, M., Becker, J. (2006). Strategies to on-line failure recovery in self-adaptive systems based on dynamic and partial reconfiguration. In First NASA/ESA conference on adaptive hardware and systems (AHS).

  40. Koren, I., & Su, S.Y.H. (1979). Reliability analysis of n-modular redundancy systems with intermittent and permanent faults. IEEE Transactions on C-Computers, c-28(7), 514–520.

    Article  MathSciNet  Google Scholar 

  41. Dorfman, R. (1943). The detection of defective members of large populations. The Annals of Mathematical Statistics, 14(4), 436–440.

    Article  Google Scholar 

  42. Litvak, E., Tu, X.M., Pagano, M. (1994). Screening for the presence of a disease by pooling sera samples. Journal of the American Statistical Association, 89(426), 424–434.

    Article  MATH  Google Scholar 

  43. Smith, C.A.B. (1947). The counterfeit coin problem. The Mathematical Gazette, 31(293), 31–39.

    Article  Google Scholar 

  44. Imran, N., Lee, J., DeMara, R.F. (2012). Fault demotion using reconfigurable slack (FaDReS). IEEE Transactions on Very Large Scale Integration (VLSI) Systems 21(7), 1364–1368.

    Google Scholar 

  45. Wiegand, T., Schwarz, H., Joch, A., Kossentini, F., Sullivan, G.J. (2003). Rate-constrained coder control and comparison of video coding standards. IEEE Transactions on Circuits and Systems for Video Technology, 13(7), 688–703.

    Article  Google Scholar 

  46. Karakonstantis, G., Mohapatra, D., Roy, K. (2009). System level DSP synthesis using voltage overscaling, unequal error protection & adaptive quality tuning. In IEEE workshop on signal processing systems (SiPS) (pp. 133–138).

  47. Karakonstantis, G., Banerjee, N., Roy, K. (2010). Process-variation resilient and voltage-scalable DCT architecture for robust low-power computing. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 18(10), 1461–1470.

    Article  Google Scholar 

  48. Bucciero, M., Walters, J.P., French, M. (2011). Software fault tolerance methodology and testing for the embedded Power PC. In Proceedings of the IEEE aerospace conference (pp. 1–9). Big Sky, MT.

  49. Imran, N., & DeMara, R.F. (2011). Heterogeneous concurrent error detection (hCED) based on output anticipation. In International conference on reconfigurable computing and FPGAs (ReConFig) (pp. 61–66).

  50. Abramovici, M., Emmert, J.M., Stroud, C.E. (2001). Roving STARs: an integrated approach to on-line testing, diagnosis, and fault tolerance for FPGAs in adaptive computing systems. In The third NASA/DoD workshop on evolvable hardware (pp. 73–92).

  51. Huang, W.-J., Mitra, S., McCluskey, E.J. (2001). Fast run-time fault location in dependable FPGA-based applications. In IEEE international symposium on defect and fault tolerance in vlsi systems (DFT) (pp. 206–214).

  52. Xilinx (2012). Partial reconfiguration user guide UG702 (v14.3) October 16.

  53. Becker, T., Luk, W., Cheung, P.Y.K. (2007). Enhancing relocatability of partial bitstreams for run-time reconfiguration. In 15th annual IEEE symposium on field-programmable custom computing machines (FCCM) (pp. 35–44).

  54. Huang, J., & Lee, J. (2011). Reconfigurable architecture for ZQDCT using computational complexity prediction and bitstream relocation. IEEE Embedded Systems Letters, 3(1), 1–4.

    Article  Google Scholar 

  55. NIST. FIPS PUB 197, Advanced Encryption Standard (AES), National Institute of Standards and Technology, U.S. Department of Commerce, November 2001. http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf.

  56. Drimer, S. (2009). Security for volatile FPGAs. Phd dissertation, University of Cambridge, 15 JJ Thomson Avenue Cambridge CB3 0FD United Kingdom.

  57. Sharma, C.A., Sarvi, A., Alzahrani, A., DeMara, R.F. (2013). Self-healing reconfigurable logic using autonomous group testing. Microprocessors and Microsystems, 37(2), 174–184.

    Article  Google Scholar 

  58. Srinivasan, S., Krishnan, R., Mangalagiri, P., Xie, Y., Narayanan, V., Irwin, M.J., Sarpatwari, K. (2008). Toward increasing FPGA lifetime. IEEE Transactions on Dependable and Secure Computing, 5(2), 115–127.

    Article  Google Scholar 

  59. Poivey, C., Berg, M., Stansberry, S., Friendlich, M., Kim, H., Petrick, D., LaBel, K. (2007). Heavy ion SEE test of Virtex-4 FPGA XC4VFX60 from Xilinx. Retrieved on January 8, 2012 http://radhome.gsfc.nasa.gov/radhome/papers/T021607_XC4VFX60.pdf.

  60. Bolchini, C., & Sandionigi, C. (2010). Fault classification for SRAM-Based FPGAs in the space environment for fault mitigation. IEEE Embedded Systems Letters, 2(4), 107–110.

    Article  Google Scholar 

  61. Trace. Video trace library: YUV 4:2:0 video sequences. Retrieved on January 20, 2012 http://trace.eas.asu.edu/yuv/.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Naveed Imran.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Imran, N., DeMara, R.F., Lee, J. et al. Self-Adapting Resource Escalation for Resilient Signal Processing Architectures. J Sign Process Syst 77, 257–280 (2014). https://doi.org/10.1007/s11265-013-0811-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-013-0811-x

Keywords

Navigation