Elsevier

Journal of Systems and Software

Volume 123, January 2017, Pages 145-159
Journal of Systems and Software

Distributed architecture for developing mixed-criticality systems in multi-core platforms

https://doi.org/10.1016/j.jss.2016.08.088Get rights and content

Highlights

  • An approach to use DDS and hypervisor technologies in multi-core systems is proposed.

  • We identify and address the communication challenges from our approach.

  • A comprehensive analysis of available architectural configurations is provided.

  • We identify the DDS configurations compatible with ARINC-like partitioned systems.

  • Our approach is validated through a case-study from the energy domain.

Abstract

Partitioning is a widespread technique that enables the execution of mixed-criticality applications in the same hardware platform. New challenges for the next generation of partitioned systems include the use of multiprocessor architectures and distribution standards in order to open up this technique to a heterogeneous set of emerging scenarios (e.g., cyber-physical systems). This work describes a system architecture that enables the use of data-centric distribution middleware in partitioned real-time embedded systems based on a hypervisor for multi-core, and it focuses on the analysis of the available architectural configurations. We also present an application-case study to evaluate and identify the possible trade-offs among the different configurations.

Introduction

When multiple functionalities are integrated in the same embedded platform, it is highly likely that some of them will be more critical to the survival of the system than others. Depending on the required degree of safety, distinct levels of criticality are identified by domain-specific standards such as DO-178B or ED-12B for avionics, IEC 880 for nuclear plants or IEC61400 for wind turbines. A mixed-criticality system represents a system that contains two or more software applications with different criticality, and where failures in higher-criticality applications may lead to unacceptable consequences (e.g., financial loss, environmental harm, personal disasters or severe damage to equipment). Hence, for instance, the safety concept for a wind turbine control system can be ensuring a safe operational state according to the design limits associated with the turbine. At the same time, this control system can provide other functionalities such as real-time control of the turbine, communications with maintenance operators or management of video-surveillance cameras.

Nowadays, the integration of mixed-criticality applications is becoming more and more popular in the development of complex embedded systems (MULTIPARTES, 2013). In some domains, this integration is often required to satisfy non-functional requirements related to cost, weight or power consumption. The development of this kind of applications can be enabled by means of strict space and time partitioning such as that proposed by the ARINC-653 specification (Airlines Electronic Engineering Committee, 2006), which defines an API called APplication EXecutive (APEX) that allows multiple applications with different safety levels to be executed in the same hardware platform (core module in the context of ARINC-653). One promising approach to build partitioned systems is that based on a hypervisor (Han and Jin, 2014), a minimal layer of software with a low overhead that supports mixed-criticality partitions built on top of operating systems with different purposes.

At the same time, embedded computing is also shifting to multi-core architectures as they are able to provide higher computational capabilities in less space. A recent study estimates that about 45% of industrial embedded applications will rely on multiprocessor architectures beyond 2015, and up to 95% of them will integrate several criticality levels (Ernst, 2010). However, multiprocessor architectures still have to address some challenges to be used in safety-critical applications (MULTIPARTES, 2013; parMERASA, 2014).

Unlike traditional approaches for developing partitioned systems, hypervisor technology enables the support of the execution of multiple operating systems on the same hardware platform, which facilitates the use of commercial-off-the-shelf (COTS) components. Furthermore, as ubiquitous connectivity increasingly penetrates traditional domains such as automotive or energy domains (MULTIPARTES, 2013), new mechanisms may be required to efficiently deal with the communication requirements of mixed-criticality applications in a transparent way. As a result of both trends, the use of middleware technology is starting to be seen as a potential solution (Dubey et al., 2011; Technical Standard for Future Airborne Capability Environment, 2014), although it still has to overcome the complexity traditionally associated with it. To this end, distribution standards are evolving towards safety-critical subsets of their full distribution facilities. This is the case of the safety-critical profile for the Data Distribution Service for Real-Time Systems (DDS) (Object Management Group, 2007), which is currently under development.

The DDS standard supports a comprehensive set of quality of service (QoS) parameters that allows fine control over non-functional properties. This has led to a large number of developments on distributed real-time applications using DDS, such as Kang et al. (2012) for cyber-physical systems, Hakiri et al. (2014) for cloud computing or Albano et al. (2015) for smart grids. Furthermore, the use of DDS in safety-critical systems is also attracting a high degree of interest. For instance, the Technical Standard for Future Airborne Capability Environment (2014) (FACE) aims to bring software interoperability to avionic systems through open standard solutions, and it includes DDS as one of the suitable candidates to develop FACE-compliant distributed systems.

As a result, handling distinct levels of criticality in software systems from different industrial sectors can be a typical scenario in the future of embedded systems engineering. However, even when the adoption of multi-core and distribution middleware provides significant benefits to mixed-criticality systems, some challenges remain open and should be further analysed. From the communication perspective, the main challenges are:

Increasing the communication responsiveness of partitioned systems. Traditional approaches to build ARINC-like partitioned systems often suffer from a significant loss of performance in communications as a result of time partitioning (Pérez and Gutiérrez, 2016). This loss of responsiveness in communications may become an impediment in taking advantage of the benefits of partitioning in modern complex embedded systems from different industries, as they are increasingly networked and might even require global connectivity. In this context, enhanced communication mechanisms are needed to develop distributed partitioned systems suitable to satisfy different safety, real-time and communication requirements. The use of multi-core platforms opens up the possibility of executing several partitions in parallel, which may serve as a basis for the development of these new communication mechanisms.

Use of standard communication middleware in partitioned systems. The emerging safety-critical profiles of distribution standards and the adoption of hypervisor technology may facilitate the use of COTS middleware in ARINC-like partitioned systems. However, this development is not straightforward and requires further analysis, as the virtualization of the network resources may compromise the use of distribution middleware on top of the hypervisor technology.

To address these challenges, this paper identifies feasible architectural configurations that enable the use of middleware and hypervisor technologies in multiprocessor systems. In particular, this work makes the following contributions:

  • Identification and analysis of challenges for partitioned distributed systems based on multi-core platforms

    This work identifies the communication challenges which need to be addressed when integrating multi-core architectures in partitioned distributed systems, and it also explores how to address them in order to increase the communication responsiveness of this kind of systems without compromising their isolation features. Furthermore, this paper outlines the restrictions that distribution middleware should cope with in order to interconnect partitions regardless of whether virtual or physical communication resources are used.

  • Assess the feasibility of the approach

    This work presents the proposal of a partitioned distributed real-time platform that integrates standard distribution middleware and hypervisor technologies. Furthermore, a representative case-study in the energy domain is included as a proof of concept in order to estimate the overheads incurred by our approach, as well as to identify the possible trade-offs among the proposed architectural configurations.

The remainder of this document is structured as follows. Section 2 presents the related work. The basic concepts of DDS and the hypervisor technology used in this work are introduced in Section 3. Section 4 explores the different system architectures that enable the use of standard distribution middleware and partitioning over multi-core architectures. The distributed real-time platform for partitioned multi-core systems is presented in Section 5. Then, Section 6 details a case-study used to evaluate the effectiveness of the proposed approach. Finally, Section 7 draws the conclusions and outlines the future work.

Section snippets

Related work

Over the last years, an increasing amount of research has been done on the use of distribution middleware in mixed-criticality environments. For instance, the European Space Agency (ESA) has elaborated a set of tools (Perrotin et al., 2010) to support the development of safety-related applications in the aerospace domain. This set of tools relies on a minimal middleware implementation which can be tailored to each target application through code-generation tools. Other approaches are based on

Overview of DDS

The DDS standard (Object Management Group, 2007) defines decentralised middleware architecture for anonymous, asynchronous and decoupled communications among publishers (i.e. suppliers of data) and subscribers (i.e., sinks of data). It is based on a global data space where data may flow from one or many publishers to one or many subscribers. The data exchanged within the global data space are defined by means of topics, and subscribers require registration of their interest in receiving

System architecture

Partitioning represents a convenient approach to enable partitions to be certified in isolation to their specific level of criticality. However, partitions may need to communicate with each other, and they may also require access to shared devices such as network cards. These aspects cause dependencies among partitions which may influence both the space and the time partitioning (Rierson, 2013):

  • Spatial partitioning requires mechanisms to ensure authorized transmission of data from one

The distributed, partitioned platform

This section describes our proposal of distributed real-time platform for partitioned systems. In particular, the platform consists of a DDS implementation called RTI Connext Micro, a real-time operating system called MaRTE OS (Aldea and González, 2001) which follows the POSIX.13 minimal real-time system profile, and the aforementioned XtratuM as the hypervisor. The development of this platform has focused on validating the feasibility of the proposed system architecture. To this end, the

Evaluation

This section aims to evaluate the effectiveness of the proposed configurations in a simulated wind power plant (inspired by an industrial use case in MULTIPARTES (2013), which is depicted in Fig. 5. A wind farm can be composed of hundreds of interconnected wind turbines, each of them with a supervisory unit which provides the following functionalities:

  • Control. The supervisory unit gathers a variety of on-line data to provide real-time control over the wind turbine. Among others, it receives

Conclusions and future work

Modern industrial embedded applications already integrate several functionalities with different criticality levels, but they are starting to migrate to multi-core architectures and their need for ubiquitous connectivity is growing swiftly. This scenario leads to ever more heterogeneous systems which must preserve their criticality requirements and may need global connectivity to execute external services at the same time. In this context, the use of multi-core platforms and standard

Acknowledgment

This work has been funded in part by the Spanish Government and FEDER funds under grant numbers TIN2011-28567-C03-02/TIN2011-28567-C03-03 (HIPARTES) and TIN2014-56158-C4-1-P/TIN2014-56158-C4-2-P (M2C2), and by the European Commission under grant number FP7 ICT 610640 (DREAMS).

Héctor Pérez Tijero has been participating in intense teaching and research activity in the Electronics and Computers Department at the University of Cantabria (Spain) since 2008. He received his M.Sc. and Ph.D. in 2008 and 2012, respectively. His Ph.D. was concerned with the integration of a real-time model into distribution middleware to facilitate the development process of distributed real-time systems. He works in software engineering for real-time systems and has been involved in several

References (50)

  • V. Brocal et al.

    Xoncrete: a scheduling tool for partitioned real-time systems

  • A. Burns et al.

    Mixed Criticality Systems – A Review

    (2014)
  • E. Carrascosa et al.

    Xtratum hypervisor redesign for LEON4 multicore processor

    SIGBED Rev.

    (2014)
  • Y. Cho et al.

    An integrated management system of virtual resources based on virtualization API and data distribution service

  • B. Cilku et al.

    Towards temporal and spatial isolation in memory hierarchies for mixed-criticality systems with hypervisors

  • A. Crespo et al.

    Partitioned embedded architecture based on hypervisor: the XtratuM approach

  • Design of embedded mixed-criticality CONTRol systems under consideration of EXtra-functional properties (CONTREX)
  • Distributed REal-time Architecture for Mixed Criticality Systems (DREAMS)
  • A. Dubey et al.

    A software platform for fractionated spacecraft

  • A. Dubey et al.

    A component model for hard real-time systems: CCM with ARINC-653

    Softw. Pract. Exp.

    (2011)
  • R. Ernst

    Certification of trusted MPSoC platforms

  • J. Galizzi et al.

    LVCUGEN (TSP-based solution) and first porting feedback”

  • M. Garcia-Valls et al.

    Analyzing point-to-point DDS communication over desktop virtualization software

    Comput. Stand. Interfaces

    (2016)
  • Z. Gu et al.

    A state-of-the-art survey on real-time issues in embedded systems virtualization

    J. Softw. Eng. Appl.

    (2012)
  • S. Han et al.

    Resource partitioning for integrated modular avionics: comparative study of implementation alternatives

    Softw. Pract. Exp.

    (2014)
  • Cited by (21)

    • A time-sensitive network scheduling algorithm based on improved ant colony optimization

      2021, Alexandria Engineering Journal
      Citation Excerpt :

      Shi et al. [9] studied the time-triggered distributed system, and solved problems like task-level and network-level scheduling and optimization. Perez et al. [10] proposed several scheduling methods for time-division multiple access (TDMA) networks. Pop et al. [11] pointed out an essential difference in scheduling between time-triggered Ethernet and TSN: the time in the TSN scheduler is the gate-controlled queues, which can be opened or closed based on graphic condition logic (GCL), rather than the frames in time-triggered Ethernet scheduler.

    • Towards mixed criticality task scheduling in cyber physical systems: Challenges and perspectives

      2019, Journal of Systems and Software
      Citation Excerpt :

      There are various forms of heterogeneity: configurational, which involves different application and power consumption requirements; architectural, concerning system capabilities and lastly, operating system heterogeneity, as different processing units have different operating system configurations (Zhou et al., 1993). A notable example of research for Cyber Physical Systems regarding this subject was made by Pérez et al. (2017). The paper describes a partitioned distributed real-time platform that incorporates hypervisor techniques and standard distributed middleware.

    • Integration of Data Distribution Service and distributed partitioned systems

      2018, Journal of Systems Architecture
      Citation Excerpt :

      Most recent works on the literature provide improvements to different aspects of the middleware such as service times making it aware of the underlying execution hardware [13]. On the performance side, there are some related works that contribute a thorough performance study of DDS for desktop virtualization technologies [10] but was not dealing with partioned systems; or the execution of DDS over a real-time hypervisor [23] although the actual network stack processing was not measured; [14] and [15] for resource handling in virtualized settings and for multiprocessor systems, respectively; or [6,16] for network level P/S evaluation, and [25] for bare machine deployments. Overall, there is not sufficient analysis on the actual execution characteristics of specific middleware technologies in general partitioned environments.

    View all citing articles on Scopus

    Héctor Pérez Tijero has been participating in intense teaching and research activity in the Electronics and Computers Department at the University of Cantabria (Spain) since 2008. He received his M.Sc. and Ph.D. in 2008 and 2012, respectively. His Ph.D. was concerned with the integration of a real-time model into distribution middleware to facilitate the development process of distributed real-time systems. He works in software engineering for real-time systems and has been involved in several research and industrial projects using emerging distribution middleware technologies to build distributed and deterministic applications.

    J. Javier Gutiérrez received his B.Sc. and Ph.D. from the University of Cantabria (Spain) in 1989 and 1995 respectively. He has been an associate professor in the Computers and Real-Time Group at the University of Cantabria since 1996, where he works in software engineering for real-time. His research activity deals with the scheduling, analysis and optimization of embedded real-time distributed systems (including communication networks). He has been involved in several research projects building real-time controllers for robots, evaluating Ada for real-time applications, developing middleware for real-time distributed systems, and proposing both models and analysis and optimization techniques for distributed real-time applications.

    Salvador Peiró Frasquet is a Software Engineer working on the software development and testing of embedded partitioned systems. Master degree in Computer Engineering in 2011 at the Polytechnic University of Valencia (UPV) and PhD in Computer Science in 2016 at the same university. His main research topics are related to the verification of the security properties of embedded operating systems such as the Linux Kernel, and, virtualization technologies as the XtratuM Hypervisor.

    Alfons Crespo is Professor of the Department of Computer Engineering of the Technical University of Valencia. He received the Ph.D. in Computer Science from the Technical University of Valencia, Spain, in 1984. He held the position of Associate professor in 1986 and full Professor in 1991. He leads the group of Industrial Informatics and has been the responsible of several European and Spanish research projects. His main research interest include different aspects of the real-time systems (scheduling, hardware support, scheduling and control integration,…). He has published more than 60 papers in specialised journals and conferences in the area of real-time systems.

    View full text