Skip to main content

Diagnostic Model and Diagnosis Algorithm of a SIMD Computer

  • Conference paper
  • First Online:
Dependable Computing — EDCC-3 (EDCC 1999)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1667))

Included in the following conference series:

Abstract

Self-diagnosis of systems comprising large numbers of processors has been studied extensively in the literature. The APEmille SIMD machine, a project of the National Institute of Nuclear Physics (INFN) of Italy, was offered as a test bed for a self-diagnosis strategy based on a comparison model.

Because of the general machine architecture and some design constraints, the standard assumptions of the existing diagnosis models are not completely fulfilled by the diagnosis support built in APEmille. This circumstance led to the development of a specific diagnostic model derived from the PMC and comparison models. The new model introduces the concept of direction-related and direction-independent faults. The consistency of this model with the APEmille architecture is discussed, and possible fault scenarios which are particularly critical for the correctness of the diagnosis are examined. It is shown that the limited hardware redundancy, extended with simple functional tests, is sufficient for obtaining valid diagnosis with the presented model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Preparata, F., P, Metze, G., and Chien, R., T., “On the Connection Assignment Problem of Diagnosable Systems”. IEEE Transactions on Computers, Vol. EC-16, No. 12, pp. 848–854, December 1967.

    Google Scholar 

  2. Barsi, F., Grandoni, F., and Maestrini, P., “A Theory of Diagnosability of Digital Systems”. IEEE Transactions on Computers, Vol. C-25, No. 6, pp. 585–593, June 1976.

    Article  MathSciNet  Google Scholar 

  3. Malek, M., “A Comparison Connection Assignment for Diagnosis of Multiprocessor Systems”, Proceedings of the 10th Symposium on Computer Architecture, pp. 31–35, May 1980.

    Google Scholar 

  4. Rangarajan, S., Fussel, D., and Malek, M., Built-in Testing of Integrated Circuit Wafers, IEEE Transactions on Computers, Vol. 39, No. 2, pp. 195–205, February 1990.

    Article  Google Scholar 

  5. Chessa, S., and Maestrini, M., “Self-Test of integrated CircuitWafers”, Proceedings of European Test Workshop, Sete, France, June 1996, pp.54–58.

    Google Scholar 

  6. Sallay, B., Maestrini, P., and Santi, P., “A Comparison-Based Diagnosis Algorithm Tolerating Comparator Faults”, to appear in IEE Proceedings on Computers and Digital Techniques.

    Google Scholar 

  7. Barborak, M., Malek, M., and Dahbura, A., T., “The Consensus Problem in Fault-Tolerant Computing”, ACM Computing Surveys, Vol. 25, No. 2, pp. 171–220, June 1993.

    Article  Google Scholar 

  8. Tripiccione, R., “Ape100 and beyond”, International Journal on Modern Physics, sec.C vol.4, 1993, pp.13–23.

    Google Scholar 

  9. Bartoloni, A., Battista, C., Cabasino, S., Cosimi, M., D’Agostini, U., Marzano, F., Panizzi, E., Paolucci, P.S., Rapuano, F., Rinaldi, W., Sarno, R., Todesco, G.M., Torelli, M., Vicini, P., Cabibbo, N., Fucci, A. and Tripiccione, R., “APEmille: a Parallel Processor in the Teraflops Range”, INFN report, March 1995.

    Google Scholar 

  10. Shigemitsu, J., “Lattice Gauge Theory: A Status Report”, Proceedings of the XXVII International Conference on High Energy Physics (edited by P. J. Bussey and I. G. Knowles). Institute of Physics Publishing, 1995, pp. 135–156.

    Google Scholar 

  11. Aglietti, F., Centurioni, E, Chessa, S., D’Auria, I., Franzinelli, F, Maestrini, P., Michelotti, A., Pagliai, I., and Tripiccione, R., “Self-Diagnosis of APEmille”, Proceedings of EDCC-2 Conference on Dependable Computing, Gliwice, Poland, May 1996, pp. 73–84.

    Google Scholar 

  12. Somani, A., K. and Agarwal, V., K., “Distributed Diagnosis Algorithm for Regular Interconnected Systems”, IEEE Transactions on Parallel and Distributed Systems, Vol. 41, No. 7, pp. 899–906, July 1992.

    MathSciNet  Google Scholar 

  13. LaForge, L., E., Huang, K., and Agarwal, V., K., “Almost Sure Diagnosis of Almost Every Good Element”, IEEE Transactions on Computers, Vol. 43, No. 3, pp. 295–305, March 1994.

    Article  Google Scholar 

  14. Huang, K., Agarwal, V.K., LaForge, L., and Thulasiraman, K., “A Diagnosis Algorithm for Constant Degree Structures and Its Application to VLSI Circuit Testing”, IEEE Transactions on Parallel and Distributed Systems, Vol. 44 No. 4, pp. 363–372, April 1995.

    Article  Google Scholar 

  15. Maestrini, P. and Santi, P., “Self-Diagnosis of Processor Arrays Using a Comparison Model”, Proceedings of the 14th SRDS-Symposium on Reliable and Distributed Systems, Bad Neuenahr, Germany, September 1995, pp. 218–228.

    Google Scholar 

  16. Chessa, S., Self-Diagnosis of Grid Interconnected Systems, with Application to Self-Test of VLSI Wafers, PhD Thesis, Dipartimento di Informatica, Universit_a di Pisa, January 1999.

    Google Scholar 

  17. Peterson, W. W. and Weldon, E. J., Error Correcting Codes, Boston, MIT Press, 1972.

    MATH  Google Scholar 

  18. Siewiorek, D. P. and Swarz, R. S., The Theory and Practice of Reliable System Design, Bedford, MS, Digital Press, 1982.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chessa, S., Sallay, B., Maestrini, P. (1999). Diagnostic Model and Diagnosis Algorithm of a SIMD Computer. In: Hlavička, J., Maehle, E., Pataricza, A. (eds) Dependable Computing — EDCC-3. EDCC 1999. Lecture Notes in Computer Science, vol 1667. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48254-7_20

Download citation

  • DOI: https://doi.org/10.1007/3-540-48254-7_20

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-66483-3

  • Online ISBN: 978-3-540-48254-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics