Abstract
The paper presents a novel modelling technique for system-level fault diagnosis in massive parallel multiprocessors, based on a re-formulation of the problem of syndrome decoding to a constraint satisfaction problem (CSP). The CSP based approach is able to handle detailed and inhomogeneous functional fault models on a similar level as the Russel-Kime model [18]. Multiple-valued logic is used to describe system components having multiple fault modes. The granularity of the models can be adjusted to the diagnostic resolution of the target without altering the methodology. Two algorithms for the Parsytec GCel massively parallel system are used as illustrations in the paper: the centralized method uses a detailed system model, and provides a fine-granular diagnostic image for off-line evaluation. The distributed method makes fast decisions for reconfiguration control, using a simplified model.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
E. Selényi, “Generalization of System-Level Diagnosis Theory,” D.Sc. Thesis, Budapest, Hungarian Academy of Sciences, 1985.
A. Pataricza, K. Tilly, E. Selényi, M. Dal Cin, “A Constraint Based Approach to System-Level Diagnosis,” Internal report 4/1994, University of Erlangen-Nürnberg, 1994.
A. Petri, “A Constraint Based Algorithm for System Level Diagnosis,” Diploma Thesis, Technical University of Budapest, 1994.
A. Pataricza, K. Tilly, E. Selényi, M. Dal Cin, A. Petri, “Constraint-based System Level Diagnosis of Multiprocessor Architectures,” Proc. of 8th Symp. on Microprocessor and Microcomputer Applications, vol. 1, pp. 75–84, 1994.
P. Urbán, “A Distributed Constraint Based Diagnosis Algorithm for Multiprocessors,” Scientific Conference of the Students, Technical University of Budapest, Faculty of Electrical Engineering and Computer Science, 1995.
J. Altmann, T. Bartha, A. Pataricza, “An Event-Driven Approach to Multiprocessor Diagnosis,” Proc. of 8th Symp. on Microprocessor and Microcomputer Applications, vol. 1, pp. 109–118, 1994.
J. Altmann, T. Bartha, A. Pataricza, “On Integrating Error Detection into a Fault Diagnosis Algorithm For Massively Parallel Computers,” Proc. of IEEE IPDS '95 Symposium, pp. 154–164, 1995.
T. Bartha, “Effective Approximate Fault Diagnosis of System with Inhomogeneous Test Invalidation,” submitted to the Euromicro '96 Conference, 1996.
K. Tilly, “Constraint Based Logic Test Generation,” Ph.D. Thesis, Hungarian Academy of Sciences, 1994.
U. Montanari, “Networks of Constraints: Fundamental Properties and Applications to Picture Processing,” Information Sciences, vol. 7, pp. 95–132, 1974.
R. Mohr, T. C. Henderson, “Arc and Path Consistency Revisited,” Artificial Intelligence, vol. 28, pp. 225–233, 1986.
A. Mackworth, E. C. Freuder, “The Complexity of Some Polynomial Network Consistency Algorithms for Constraint Satisfaction Problems,” Artificial Intelligence, vol 25, pp. 65–74, 1985.
R. Seidel, “A New Method for Solving Constraint Satisfaction Problems”, IJCAI '81, pp. 338–342, 1981.
P. van Beek, “A Binary CSP Solution Library,” available by FTP from ftp.cs.alberta.ca.
G. Kondrak, “A Theoretical Evaluation of Selected Backtracking Algorithms,” M.Sc. Thesis, University of Alberta, Edmonton, 1994.
M. Barborak, M. Malek, A. Dahbura, “The Consensus Problem in Fault-Tolerant Computing,” ACM Computing Surveys, vol. 25, no. 2, pp. 171–220, June 1993.
F. Preparata; G. Metze; R. Chien, “On the Connection Assignment Problem of Diagnosable Systems,” IEEE Trans. Comput., vol. EC-16, no. 6, pp. 848–854, Dec. 1967.
C. Kime, “System Diagnosis,” in Fault-Tolerant Computing: Theory and Techniques, D. Pradhan ed., Prentice-Hall, New York, pp. 577–623, 1985.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Altmann, J., Bartha, T., Pataricza, A., Petri, A., Urbán, P. (1996). Constraint based system-level diagnosis of multiprocessors. In: Hlawiczka, A., Silva, J.G., Simoncini, L. (eds) Dependable Computing — EDCC-2. EDCC 1996. Lecture Notes in Computer Science, vol 1150. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61772-8_51
Download citation
DOI: https://doi.org/10.1007/3-540-61772-8_51
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61772-3
Online ISBN: 978-3-540-70677-9
eBook Packages: Springer Book Archive