Abstract
Self-diagnosis of systems comprising large numbers of processors has been studied extensively in the literature. The APEmille SIMD machine, a project of the National Institute of Nuclear Physics (INFN) of Italy, was offered as a test bed for a self-diagnosis strategy based on a comparison model.
Because of the general machine architecture and some design constraints, the standard assumptions of the existing diagnosis models are not completely fulfilled by the diagnosis support built in APEmille. This circumstance led to the development of a specific diagnostic model derived from the PMC and comparison models. The new model introduces the concept of direction-related and direction-independent faults. The consistency of this model with the APEmille architecture is discussed, and possible fault scenarios which are particularly critical for the correctness of the diagnosis are examined. It is shown that the limited hardware redundancy, extended with simple functional tests, is sufficient for obtaining valid diagnosis with the presented model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Preparata, F., P, Metze, G., and Chien, R., T., “On the Connection Assignment Problem of Diagnosable Systems”. IEEE Transactions on Computers, Vol. EC-16, No. 12, pp. 848–854, December 1967.
Barsi, F., Grandoni, F., and Maestrini, P., “A Theory of Diagnosability of Digital Systems”. IEEE Transactions on Computers, Vol. C-25, No. 6, pp. 585–593, June 1976.
Malek, M., “A Comparison Connection Assignment for Diagnosis of Multiprocessor Systems”, Proceedings of the 10th Symposium on Computer Architecture, pp. 31–35, May 1980.
Rangarajan, S., Fussel, D., and Malek, M., Built-in Testing of Integrated Circuit Wafers, IEEE Transactions on Computers, Vol. 39, No. 2, pp. 195–205, February 1990.
Chessa, S., and Maestrini, M., “Self-Test of integrated CircuitWafers”, Proceedings of European Test Workshop, Sete, France, June 1996, pp.54–58.
Sallay, B., Maestrini, P., and Santi, P., “A Comparison-Based Diagnosis Algorithm Tolerating Comparator Faults”, to appear in IEE Proceedings on Computers and Digital Techniques.
Barborak, M., Malek, M., and Dahbura, A., T., “The Consensus Problem in Fault-Tolerant Computing”, ACM Computing Surveys, Vol. 25, No. 2, pp. 171–220, June 1993.
Tripiccione, R., “Ape100 and beyond”, International Journal on Modern Physics, sec.C vol.4, 1993, pp.13–23.
Bartoloni, A., Battista, C., Cabasino, S., Cosimi, M., D’Agostini, U., Marzano, F., Panizzi, E., Paolucci, P.S., Rapuano, F., Rinaldi, W., Sarno, R., Todesco, G.M., Torelli, M., Vicini, P., Cabibbo, N., Fucci, A. and Tripiccione, R., “APEmille: a Parallel Processor in the Teraflops Range”, INFN report, March 1995.
Shigemitsu, J., “Lattice Gauge Theory: A Status Report”, Proceedings of the XXVII International Conference on High Energy Physics (edited by P. J. Bussey and I. G. Knowles). Institute of Physics Publishing, 1995, pp. 135–156.
Aglietti, F., Centurioni, E, Chessa, S., D’Auria, I., Franzinelli, F, Maestrini, P., Michelotti, A., Pagliai, I., and Tripiccione, R., “Self-Diagnosis of APEmille”, Proceedings of EDCC-2 Conference on Dependable Computing, Gliwice, Poland, May 1996, pp. 73–84.
Somani, A., K. and Agarwal, V., K., “Distributed Diagnosis Algorithm for Regular Interconnected Systems”, IEEE Transactions on Parallel and Distributed Systems, Vol. 41, No. 7, pp. 899–906, July 1992.
LaForge, L., E., Huang, K., and Agarwal, V., K., “Almost Sure Diagnosis of Almost Every Good Element”, IEEE Transactions on Computers, Vol. 43, No. 3, pp. 295–305, March 1994.
Huang, K., Agarwal, V.K., LaForge, L., and Thulasiraman, K., “A Diagnosis Algorithm for Constant Degree Structures and Its Application to VLSI Circuit Testing”, IEEE Transactions on Parallel and Distributed Systems, Vol. 44 No. 4, pp. 363–372, April 1995.
Maestrini, P. and Santi, P., “Self-Diagnosis of Processor Arrays Using a Comparison Model”, Proceedings of the 14th SRDS-Symposium on Reliable and Distributed Systems, Bad Neuenahr, Germany, September 1995, pp. 218–228.
Chessa, S., Self-Diagnosis of Grid Interconnected Systems, with Application to Self-Test of VLSI Wafers, PhD Thesis, Dipartimento di Informatica, Universit_a di Pisa, January 1999.
Peterson, W. W. and Weldon, E. J., Error Correcting Codes, Boston, MIT Press, 1972.
Siewiorek, D. P. and Swarz, R. S., The Theory and Practice of Reliable System Design, Bedford, MS, Digital Press, 1982.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chessa, S., Sallay, B., Maestrini, P. (1999). Diagnostic Model and Diagnosis Algorithm of a SIMD Computer. In: Hlavička, J., Maehle, E., Pataricza, A. (eds) Dependable Computing — EDCC-3. EDCC 1999. Lecture Notes in Computer Science, vol 1667. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48254-7_20
Download citation
DOI: https://doi.org/10.1007/3-540-48254-7_20
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66483-3
Online ISBN: 978-3-540-48254-3
eBook Packages: Springer Book Archive