Fault tolerant design of a field data modular readout architecture for railway applications

doi:10.1016/j.ress.2015.06.008

Reliability Engineering & System Safety

Volume 142, October 2015, Pages 456-462

https://doi.org/10.1016/j.ress.2015.06.008 Get rights and content

Highlights

•
The possibility to provide designers with standard sample safe solution.
•
Addressing the design concept of a readout interface according to highly diffused standards.
•
Comparison of multiple solutions in terms of RAMS parameters.
•
Provide to designers a general simple methodology to approach similar problems.

Abstract

Modern data acquisition systems used to collect sensor signals are usually designed taking into consideration performance and operating parameters which are mainly related to sensitivity, selectivity, resolution and stability over time. In addition to such important features, field application systems should also respond to other constraints like reliability and availability and additionally, depending on the specific application, to some peculiar requirements in terms of safety. The present paper is addressed to supply an overview of the implications, during a sensor input/output hardware module design, of such parameters as the safety integrity level. The discussion involves the overall system design once integrated with availability considerations. In this manuscript, considerations concerning the on board software implementation are omitted without loss in generality. The study has been developed taking into account solutions suitable for railway applications like signaling or crossing detection systems.

Introduction

The design of a sensor acquisition system able to operate in harsh environments and responding to severe constraints in terms not only of data acquisition performance parameters, but also in terms of robustness to common use, is still an open research field. This is due to the fact that, even if compact field distributed acquisition systems are used in a wide variety of industrial contexts ranging from telecommunications up to oil field monitoring ones, safety requirements result to be, most of times, application dependent. In particular in railway signaling systems, for example, the constraints according to some standards [1], [2] can represent an actual obstacle for designers both in terms of software and hardware because the safety functions are considered to operate continuously and not only in low demand mode [1]. The introduction of new technologies and new hardware solutions to cover safety functions, even if from one side is an added resource for designers, on the other hand constitutes another constraint. The reason is that usually well proven architectures are a must to comply with the most used standards, avoiding additional testing costs.

Therefore additional design parameters as system reliability, availability maintainability and safety (RAMS) should be considered during the main design phases to ensure that the newly designed system could withstand a wide variety of application conditions. Some researchers have tried to discuss the variation of the system availability and reliability over time for some specific uses related to railways or telecommunications [3], [4], [5], [6], [7] or to oil and gas systems [8], [9] with only limited safety considerations. These works can in general describe how RAM parameters can change dynamically providing exploitable reconfigurable models. Other authors as in [9], [10], [11] tried to describe suitable roadmaps for the definition of degradation models of the systems, trying to keep the safety requirements constrained within upper and lower boundaries. Nevertheless, in these cases authors generally provide complex approaches which may result difficult to be followed by designers in practical approaches where also maintenance considerations should be taken into account. The introduction of advanced techniques as the Hidden Markov Modeling approaches allowed for the embedding of the service shop activities to retrofit the a priori assumed failure and repair rates. In this way, the model is bended to the actual system life under effective use. Such effort in principle is useful to allow system parameter retrofit. Other authors try, as in [12] to show how variations in the risk reduction factor (RRFs) may affect the design options, and try to introduce a rough cost model to discriminate between the options on cost basis. In [13] authors provide an effective approach toward the functional safety assessment of pre-crash systems for reciprocal hazards in the automobile application field building suitable simulation models. Finally in [14], [15] authors try to discuss general approach strategies to address safety problems exploiting traditional methods. Nevertheless, none of the previously mentioned papers address the issue of designing a flexible structure in terms of hardware structure or architecture for sensors readout in order to be able to cope with different safety requirements (linked to different safety functions) with a single modular solution.

Some other authors [16] have discussed the case of safe instrumented systems for low demand mode analyzing the impact of different testing strategies exploited in mechanical and industrial plants. The authors of [17] propose an interesting and simplified method for safety integrity level evaluation based on reliability block diagram degradation approach. Such approach simplifies the formulas presented in [1], [2] supplying the designers with a useful tool to be exploited during system design. Nevertheless, this kind of researches, even if introducing alternative approaches to formulas provided into the mostly used standards, has been applied in cases managed with long period of testing proof intervals only and cannot be exploited for continuous operation mode cases. In general case studies [18], [19] the designers and researchers focusses most of times in efficiency management and measurement performance skipping most of the considerations on safety constraints introduced by the specific application requirements. Some authors tried to address the problem of analyzing the behavior of low demand mode versus high demand mode systems in terms of both testing proof interval changes and configuration management [20], [21]. In particular while [20] focused on the effectiveness of testing on a general instrumented system on the basis of the demand mode classification, [21] tried to optimize the testing proof interval according to a specific selected architecture. Nevertheless neither of these latter [20], [21] addressed specifically problems related to railways context which are very peculiar and strictly application dependent.

The authors of this manuscript present an analysis performed within the boundaries of some relevant international safety standards [1], [2] to suggest one solution suitable for exploitation whenever an a priori safety design requirement is established.

In details, in this paper the authors tried to start from a basic architecture composed by the sensing element, a logical unit devoted to data manipulation and management, and a final actuating/output element to develop a modular solution able to cope with a commonly required safety integrity level (SIL according to [2] standard), established in particular for railway applications. In these latter cases in particular safety requirements are generally selected according to [1], [2] with higher rank (SIL 3 or 4) than in other fields. The proposed modular solutions have been then evaluated in terms of availability parameters, and the best configuration in terms of such parameters has been proposed. The purpose and the novelty of this manuscript resides in the possibility to provide a useful guidance for designers who have to deal with continuous monitoring systems starting from very simple architectures up to complex structures exploitable in particular for railway applications, where standard configurations can be exploited and further enhanced for specific signaling interfacing systems.

The paper is arranged in six sections. In the Section 1 a general introduction to the problem is supplied with an indication of the state of the art of the hardware safety design approaches in different application fields. In Section 2 a general system description and overview is commented. In Section 3 a selected architecture is proposed and different basic configurations are compared in terms of RAMS characteristics, excluding maintenance policies dissertation and assuming only corrective actions. In Section 4 the simulation results are shown and discussed while in Section 5 a possible modular hardware solution able to cover different safety function requirements is proposed. Results in particular highlight that complex system of course may present lower reliability/availability data while proving at the same time a satisfactory protection degree and reduced residual risk. In Section 6 the conclusions are presented.

Section snippets

System description

The proposed basic architecture is a fault tolerant smart front end system for safety-critical applications in industrial processes or railway area, supporting severe requirements of configurations and response time. The system works with centralized and distributed configurations, with a modular redundant (MR) architecture to eliminate single points of failure and to ensure the required system availability.

The system can operate correctly with the presence of a major component fault and

Architectures modeling

Once the basic structure is set, the problem is to define the redundancy of the three main sections of Fig. 1 in order to meet the requirements of multiple safety functions usually present in such systems. The analysis of suitable configurations is developed to meet the requirements of the IEC61508 [2] safety standard (safety integrity level SIL) in designing local or distributed systems for data collection from filed sensors and subsequent manipulation for control purposes. The design will

Single line 1oo1

The architecture shown in Fig. 5 is the simplest in terms of system configuration blocks that can be implemented to cover a safety function. This architecture is discussed as an example due to the fact that this specific arrangement does not allow for meeting the fault tolerant requirements. It will be therefore not considered in the foregoing analysis. Exploiting the data of Table 1 it is possible to define the system PFD as

PFD_S=4.0038E^-06;
PFD_LS=1.304E^-06;
PFD_FE=3.5985E^-07;
System PFD=5.6676E^-06

Sensor interface proposed architecture

In Table 3 a summary of the expected number of failures and availability figures of the proposed configurations is reported.

On the basis of the former considerations and out of the tradeoff among system availability, safety degree and system configurability a possible solution has been identified, at least for the input stage, which resulted in the most critical section. The Triple Modular Redundant (TMR) architecture ensures fault tolerance and provides error-free, uninterrupted control in the

Conclusions

In this paper the authors described a possible solution for sensor based smart frontend architectures to be used in railway applications which should match an established SIL level. The authors analyzed several possible configurations in terms of system availability and safety requirements ending with a proposal of a solution representing the tradeoff among the reference parameters considered. A triple redundant module for the sensing input interface represents a possible implementation

References (21)

A. Fort et al.
Hidden Markov models approach used for life parameters estimations
Reliab Eng Syst Saf
(2015)
Haitao Guo et al.
Automatic creation of Markov models for reliability assessment of safety instrumented systems
Reliab Eng Syst Saf
(2008)
Hui Jin et al.
Reliability of safety-instrumented systems subject to partial testing and common-cause failures
Reliab Eng Syst Saf Vol
(2014)
Long Ding et al.
A novel method for SIL verification based on system degradation using reliability block diagram
Reliab Eng Syst Saf Vol
(2014)
Yiliu Liu et al.
Reliability assessment of safety instrumented systems subject to different demand modes
J Loss Prev Process Ind
(2011)
A.C. Torres-Echeverría et al.
Modelling and optimization of proof testing policies for safety instrumented systems
Reliab Eng Syst Saf
(2009)
CENELEC 50129 and CENELEC...
IEC61508-1-6. Functional safety of electrical/electronic/programmable electronic safety related...
Babczyński T, Magott J. Dependability and safety analysis of ERTMS level 3 using analytic estimation safety and...
Fort A et al.. Availability modeling of a safe communication system for rolling stock applications. In: Proceedings of...

There are more references available in the full text version of this article.

Cited by (9)

Magnetic brakes material characterization under accelerated testing conditions
2020, Reliability Engineering and System Safety
Citation Excerpt :
In this way it is possible to control the brakes set wearout based on a direct torque measurement looking at the current feedback. In recent research papers [5–10] reliability availability and safety aspects of electromechanical components and systems have been deeply analyzed considering mainly the mathematical modeling of failure rate and focusing on the identification of optimal sensing strategies. In [5] for example the authors present a model to take in account what happens with censored data and incomplete information on maintenance shop to evaluate service quality factor out of field information.
Harvester and agricultural machines are driven both manually and automatically on swarm strategies. In order to avoid undesired movements of the machine, powerful brakes (clutches) are installed on the steering rod to keep track of the machine movements and correct them if undesired events like sudden changes from the original path are detected. The need to operate with reliable and robust devices suggested to design a durability testing bench to assess different braking material performance. In this paper a test bench has been developed to perform wear out characterization and data gathering from different braking materials mounted on a single brake configuration. The proposed work aims at the development of both a robust testing device from the mechanical perspective and a measurement system able to perform accelerated testing controlling the testing temperature and a suitable ageing model for a subset of commercial metallic/epoxide powders used for braking purposes. The work proposes for the first time, some ageing laws parameters for a commonly used braking material exploited in braking systems for heavy duty steering column machines.
Distributed UPS control systems reliability analysis
2017, Measurement: Journal of the International Measurement Confederation
Citation Excerpt :
On the contrary, distributed architecture based on modular solutions allows system configurability and expandability. However, the development of distributed UPSs requires the use of complex control boards, implementing a larger number of control functions, and this could jeopardize the availability performance [3–10] expandable UPS. In general, in a UPS the control board is necessary to manage the correct UPS behaviour during both the operating phase and the standby one.
Modern power systems should comply with high reliability and availability standards in order to meet customer final expectations and market needs. In this context it is quite obvious that modern uninterruptible power supply (UPS) have started to exploit improvement strategies to achieve higher availability figures over time. At the same time, power, size and performance constraints become stricter placing tight barriers to developers. UPS developers started therefore to design distributed systems where different functions could be allocated to different control board sections. The newly designed systems resulted to have higher availability and improved modularity and expandability. In this paper the authors compare the availability performance of an UPS with a single core control board with a distributed UPS based on multiprocessor control boards. Additionally, a model for the variations of the failure rate due to environmental changes is proposed. In particular, Weibull probability density functions have been considered in order to take into account failure rate changes induced by temperature increase at warmup in a limited time frame.
Synchronization of faulty processors in coarse-grained TMR protected partially reconfigurable FPGA designs
2016, Reliability Engineering and System Safety
Citation Excerpt :
To avoid this issue, methods for mitigating the susceptibility of FPGA designs against SEUs have been thoroughly investigated in the literature by resorting to Error Correction Codes (ECC) [21,22] or Duplication With Comparison (DWC) [23,24]. In particular, the so-called Triple Modular Redundancy (TMR) method results to be the most frequently addressed by both industry and academia in diverse technological architectures [25]. The rationale for this trend is threefold: (1) the possibility of fault masking by implementing the process of voting; (2) the method of scaling the TMR protection by changing its granularity [26]; and (3) the availability of tools allowing for a completely automated TMR generation [27].
The expansion of FPGA technology in numerous application fields is a fact. Single Event Effects (SEE) are a critical factor for the reliability of FPGA based systems. For this reason, a number of researches have been studying fault tolerance techniques to harden different elements of FPGA designs. Using Partial Reconfiguration (PR) in conjunction with Triple Modular Redundancy (TMR) is an emerging approach in recent publications dealing with the implementation of fault tolerant processors on SRAM-based FPGAs. While these works pay great attention to the repair of erroneous instances by means of reconfiguration, the essential step of synchronizing the repaired processors is insufficiently addressed. In this context, this paper poses four different synchronization approaches for soft core processors, which balance differently the trade-off between synchronization speed and hardware overhead. All approaches are assessed in practice by synchronizing TMR protected PicoBlaze processors implemented on a Virtex-5 FPGA. Nevertheless all methods are of a general nature and can be applied for different processor architectures in a straightforward fashion.
Study on the systematic approach of Markov modeling for dependability analysis of complex fault-tolerant features with voting logics
2016, Reliability Engineering and System Safety
Citation Excerpt :
The Markov model is a proper tool for modeling complex systems involving timing, sequencing, repair, redundancy, and fault tolerance [1]. Therefore, it is widely used to quantify system dependability in areas such as performance, availability, reliability, and safety [2–10], and different solution techniques for various Markov models have been studied [11]. However, the Markov model rapidly becomes large and unwieldy as the system size increases and thus it is difficult to construct and solve Markov models for large systems [12–14].
The Markov analysis is a technique for modeling system state transitions and calculating the probability of reaching various system states. While it is a proper tool for modeling complex system designs involving timing, sequencing, repair, redundancy, and fault tolerance, as the complexity or size of the system increases, so does the number of states of interest, leading to difficulty in constructing and solving the Markov model. This paper introduces a systematic approach of Markov modeling to analyze the dependability of a complex fault-tolerant system. This method is based on the decomposition of the system into independent subsystem sets, and the system-level failure rate and the unavailability rate for the decomposed subsystems. A Markov model for the target system is easily constructed using the system-level failure and unavailability rates for the subsystems, which can be treated separately. This approach can decrease the number of states to consider simultaneously in the target system by building Markov models of the independent subsystems stage by stage, and results in an exact solution for the Markov model of the whole target system. To apply this method we construct a Markov model for the reactor protection system found in nuclear power plants, a system configured with four identical channels and various fault-tolerant architectures. The results show that the proposed method in this study treats the complex architecture of the system in an efficient manner using the merits of the Markov model, such as a time dependent analysis and a sequential process analysis.
Modular test bed for magnetic brakes characterization and durability testing
2018, 4th IEEE International Symposium on Systems Engineering, ISSE 2018 - Proceedings
Large plants failures modeling under variable commissioning scheduling
2017, 2017 IEEE International Symposium on Systems Engineering, ISSE 2017 - Proceedings

View all citing articles on Scopus

View full text

Fault tolerant design of a field data modular readout architecture for railway applications

Highlights

Abstract

Introduction

Section snippets

System description

Architectures modeling

Single line 1oo1

Sensor interface proposed architecture

Conclusions

Reliab Eng Syst Saf

Reliab Eng Syst Saf

Reliab Eng Syst Saf Vol

Reliab Eng Syst Saf Vol

J Loss Prev Process Ind

Reliab Eng Syst Saf