Fault tolerant design of a field data modular readout architecture for railway applications

https://doi.org/10.1016/j.ress.2015.06.008Get rights and content

Highlights

  • The possibility to provide designers with standard sample safe solution.

  • Addressing the design concept of a readout interface according to highly diffused standards.

  • Comparison of multiple solutions in terms of RAMS parameters.

  • Provide to designers a general simple methodology to approach similar problems.

Abstract

Modern data acquisition systems used to collect sensor signals are usually designed taking into consideration performance and operating parameters which are mainly related to sensitivity, selectivity, resolution and stability over time. In addition to such important features, field application systems should also respond to other constraints like reliability and availability and additionally, depending on the specific application, to some peculiar requirements in terms of safety. The present paper is addressed to supply an overview of the implications, during a sensor input/output hardware module design, of such parameters as the safety integrity level. The discussion involves the overall system design once integrated with availability considerations. In this manuscript, considerations concerning the on board software implementation are omitted without loss in generality. The study has been developed taking into account solutions suitable for railway applications like signaling or crossing detection systems.

Introduction

The design of a sensor acquisition system able to operate in harsh environments and responding to severe constraints in terms not only of data acquisition performance parameters, but also in terms of robustness to common use, is still an open research field. This is due to the fact that, even if compact field distributed acquisition systems are used in a wide variety of industrial contexts ranging from telecommunications up to oil field monitoring ones, safety requirements result to be, most of times, application dependent. In particular in railway signaling systems, for example, the constraints according to some standards [1], [2] can represent an actual obstacle for designers both in terms of software and hardware because the safety functions are considered to operate continuously and not only in low demand mode [1]. The introduction of new technologies and new hardware solutions to cover safety functions, even if from one side is an added resource for designers, on the other hand constitutes another constraint. The reason is that usually well proven architectures are a must to comply with the most used standards, avoiding additional testing costs.

Therefore additional design parameters as system reliability, availability maintainability and safety (RAMS) should be considered during the main design phases to ensure that the newly designed system could withstand a wide variety of application conditions. Some researchers have tried to discuss the variation of the system availability and reliability over time for some specific uses related to railways or telecommunications [3], [4], [5], [6], [7] or to oil and gas systems [8], [9] with only limited safety considerations. These works can in general describe how RAM parameters can change dynamically providing exploitable reconfigurable models. Other authors as in [9], [10], [11] tried to describe suitable roadmaps for the definition of degradation models of the systems, trying to keep the safety requirements constrained within upper and lower boundaries. Nevertheless, in these cases authors generally provide complex approaches which may result difficult to be followed by designers in practical approaches where also maintenance considerations should be taken into account. The introduction of advanced techniques as the Hidden Markov Modeling approaches allowed for the embedding of the service shop activities to retrofit the a priori assumed failure and repair rates. In this way, the model is bended to the actual system life under effective use. Such effort in principle is useful to allow system parameter retrofit. Other authors try, as in [12] to show how variations in the risk reduction factor (RRFs) may affect the design options, and try to introduce a rough cost model to discriminate between the options on cost basis. In [13] authors provide an effective approach toward the functional safety assessment of pre-crash systems for reciprocal hazards in the automobile application field building suitable simulation models. Finally in [14], [15] authors try to discuss general approach strategies to address safety problems exploiting traditional methods. Nevertheless, none of the previously mentioned papers address the issue of designing a flexible structure in terms of hardware structure or architecture for sensors readout in order to be able to cope with different safety requirements (linked to different safety functions) with a single modular solution.

Some other authors [16] have discussed the case of safe instrumented systems for low demand mode analyzing the impact of different testing strategies exploited in mechanical and industrial plants. The authors of [17] propose an interesting and simplified method for safety integrity level evaluation based on reliability block diagram degradation approach. Such approach simplifies the formulas presented in [1], [2] supplying the designers with a useful tool to be exploited during system design. Nevertheless, this kind of researches, even if introducing alternative approaches to formulas provided into the mostly used standards, has been applied in cases managed with long period of testing proof intervals only and cannot be exploited for continuous operation mode cases. In general case studies [18], [19] the designers and researchers focusses most of times in efficiency management and measurement performance skipping most of the considerations on safety constraints introduced by the specific application requirements. Some authors tried to address the problem of analyzing the behavior of low demand mode versus high demand mode systems in terms of both testing proof interval changes and configuration management [20], [21]. In particular while [20] focused on the effectiveness of testing on a general instrumented system on the basis of the demand mode classification, [21] tried to optimize the testing proof interval according to a specific selected architecture. Nevertheless neither of these latter [20], [21] addressed specifically problems related to railways context which are very peculiar and strictly application dependent.

The authors of this manuscript present an analysis performed within the boundaries of some relevant international safety standards [1], [2] to suggest one solution suitable for exploitation whenever an a priori safety design requirement is established.

In details, in this paper the authors tried to start from a basic architecture composed by the sensing element, a logical unit devoted to data manipulation and management, and a final actuating/output element to develop a modular solution able to cope with a commonly required safety integrity level (SIL according to [2] standard), established in particular for railway applications. In these latter cases in particular safety requirements are generally selected according to [1], [2] with higher rank (SIL 3 or 4) than in other fields. The proposed modular solutions have been then evaluated in terms of availability parameters, and the best configuration in terms of such parameters has been proposed. The purpose and the novelty of this manuscript resides in the possibility to provide a useful guidance for designers who have to deal with continuous monitoring systems starting from very simple architectures up to complex structures exploitable in particular for railway applications, where standard configurations can be exploited and further enhanced for specific signaling interfacing systems.

The paper is arranged in six sections. In the Section 1 a general introduction to the problem is supplied with an indication of the state of the art of the hardware safety design approaches in different application fields. In Section 2 a general system description and overview is commented. In Section 3 a selected architecture is proposed and different basic configurations are compared in terms of RAMS characteristics, excluding maintenance policies dissertation and assuming only corrective actions. In Section 4 the simulation results are shown and discussed while in Section 5 a possible modular hardware solution able to cover different safety function requirements is proposed. Results in particular highlight that complex system of course may present lower reliability/availability data while proving at the same time a satisfactory protection degree and reduced residual risk. In Section 6 the conclusions are presented.

Section snippets

System description

The proposed basic architecture is a fault tolerant smart front end system for safety-critical applications in industrial processes or railway area, supporting severe requirements of configurations and response time. The system works with centralized and distributed configurations, with a modular redundant (MR) architecture to eliminate single points of failure and to ensure the required system availability.

The system can operate correctly with the presence of a major component fault and

Architectures modeling

Once the basic structure is set, the problem is to define the redundancy of the three main sections of Fig. 1 in order to meet the requirements of multiple safety functions usually present in such systems. The analysis of suitable configurations is developed to meet the requirements of the IEC61508 [2] safety standard (safety integrity level SIL) in designing local or distributed systems for data collection from filed sensors and subsequent manipulation for control purposes. The design will

Single line 1oo1

The architecture shown in Fig. 5 is the simplest in terms of system configuration blocks that can be implemented to cover a safety function. This architecture is discussed as an example due to the fact that this specific arrangement does not allow for meeting the fault tolerant requirements. It will be therefore not considered in the foregoing analysis. Exploiting the data of Table 1 it is possible to define the system PFD as

  • PFDS=4.0038E-06;

  • PFDLS=1.304E-06;

  • PFDFE=3.5985E-07;

  • System PFD=5.6676E-06

Sensor interface proposed architecture

In Table 3 a summary of the expected number of failures and availability figures of the proposed configurations is reported.

On the basis of the former considerations and out of the tradeoff among system availability, safety degree and system configurability a possible solution has been identified, at least for the input stage, which resulted in the most critical section. The Triple Modular Redundant (TMR) architecture ensures fault tolerance and provides error-free, uninterrupted control in the

Conclusions

In this paper the authors described a possible solution for sensor based smart frontend architectures to be used in railway applications which should match an established SIL level. The authors analyzed several possible configurations in terms of system availability and safety requirements ending with a proposal of a solution representing the tradeoff among the reference parameters considered. A triple redundant module for the sensing input interface represents a possible implementation

References (21)

There are more references available in the full text version of this article.

Cited by (9)

  • Magnetic brakes material characterization under accelerated testing conditions

    2020, Reliability Engineering and System Safety
    Citation Excerpt :

    In this way it is possible to control the brakes set wearout based on a direct torque measurement looking at the current feedback. In recent research papers [5–10] reliability availability and safety aspects of electromechanical components and systems have been deeply analyzed considering mainly the mathematical modeling of failure rate and focusing on the identification of optimal sensing strategies. In [5] for example the authors present a model to take in account what happens with censored data and incomplete information on maintenance shop to evaluate service quality factor out of field information.

  • Distributed UPS control systems reliability analysis

    2017, Measurement: Journal of the International Measurement Confederation
    Citation Excerpt :

    On the contrary, distributed architecture based on modular solutions allows system configurability and expandability. However, the development of distributed UPSs requires the use of complex control boards, implementing a larger number of control functions, and this could jeopardize the availability performance [3–10] expandable UPS. In general, in a UPS the control board is necessary to manage the correct UPS behaviour during both the operating phase and the standby one.

  • Synchronization of faulty processors in coarse-grained TMR protected partially reconfigurable FPGA designs

    2016, Reliability Engineering and System Safety
    Citation Excerpt :

    To avoid this issue, methods for mitigating the susceptibility of FPGA designs against SEUs have been thoroughly investigated in the literature by resorting to Error Correction Codes (ECC) [21,22] or Duplication With Comparison (DWC) [23,24]. In particular, the so-called Triple Modular Redundancy (TMR) method results to be the most frequently addressed by both industry and academia in diverse technological architectures [25]. The rationale for this trend is threefold: (1) the possibility of fault masking by implementing the process of voting; (2) the method of scaling the TMR protection by changing its granularity [26]; and (3) the availability of tools allowing for a completely automated TMR generation [27].

  • Study on the systematic approach of Markov modeling for dependability analysis of complex fault-tolerant features with voting logics

    2016, Reliability Engineering and System Safety
    Citation Excerpt :

    The Markov model is a proper tool for modeling complex systems involving timing, sequencing, repair, redundancy, and fault tolerance [1]. Therefore, it is widely used to quantify system dependability in areas such as performance, availability, reliability, and safety [2–10], and different solution techniques for various Markov models have been studied [11]. However, the Markov model rapidly becomes large and unwieldy as the system size increases and thus it is difficult to construct and solve Markov models for large systems [12–14].

  • Modular test bed for magnetic brakes characterization and durability testing

    2018, 4th IEEE International Symposium on Systems Engineering, ISSE 2018 - Proceedings
  • Large plants failures modeling under variable commissioning scheduling

    2017, 2017 IEEE International Symposium on Systems Engineering, ISSE 2017 - Proceedings
View all citing articles on Scopus
View full text