Advanced Primary–Backup Platform with Container-Based Automatic Deployment for Fault-Tolerant Systems

Lee, Jaemyoun; Jeong, Haegeon; Lee, Won-Joo; Suh, Hyo-Joong; Lee, Dongeun; Kang, Kyungtae

doi:10.1007/s11277-017-4282-4

Advanced Primary–Backup Platform with Container-Based Automatic Deployment for Fault-Tolerant Systems

Published: 27 April 2017

Volume 98, pages 3177–3194, (2018)
Cite this article

Wireless Personal Communications Aims and scope Submit manuscript

Jaemyoun Lee¹,
Haegeon Jeong¹,
Won-Joo Lee²,
Hyo-Joong Suh³,
Dongeun Lee⁴ &
…
Kyungtae Kang ORCID: orcid.org/0000-0002-6587-7044¹

409 Accesses
2 Citations
Explore all metrics

Abstract

Within mission-critical systems, the primary–backup scheme is a desirable approach for improving reliability and fault tolerance. It can be used to ensure a high mission success rate despite unexpected errors. However, it must cope with the need to maintain consistency between a primary and a backup whenever the primary encounters unexpected errors. We overcome this issue by introducing a platform that uses container-based light virtualization and an automatic build system to isolate an application so that it may then be deployed on different devices without manual intervention. We believe an advanced deployment procedure can retain the consistency of the primary–backup systems with low implementation complexity. Integrated with a cloud application, it can also manage mission-critical systems effectively, communicate with the redundant systems, and detect unexpected errors by using sophisticated fault-detection technologies. We demonstrate that the platform can improve the reliability of mission-critical systems through realistic experiment using a model electronic vehicle and can reduce hardware dependencies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Controller Safety Concept Based on Software-Implemented Fault Tolerance for Fail-Operational Automotive Applications

New Distribution Paradigms for Railway Interlocking

Unknown Threats and Provisions

References

Zhang, Y., Chamseddine, A., Rabbath, C., Gordon, B., Su, C.-Y., Rakheja, S., et al. (2013). Development of advanced FDD and FTC techniques with application to an unmanned quadrotor helicopter testbed. Journal of the Franklin Institute, 350(9), 2396–2422.
Article MATH Google Scholar
Saied, M., Lussier, B., Fantoni, I., Francis, C., & Shraim, H. (2015). Fault tolerant control for multiple successive failures in an octorotor: Architecture and experiments. In Proceedidings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS`15), (pp. 40–45).
Park, J., Lee, S., Yoon, T., & Kim, J. (2015). An autonomic control system for high-reliable CPS. Cluster Computing, 18(2), 587–598.
Article Google Scholar
Asikin, D., & Dolan, J. M. (2010). Reliability impact on planetary robotic missions. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS`10), (pp. 4095–4100).
Freddi, A., Longhi, S., & Monteriu, A. (2012). A diagnostic Thau observer for a class of unmanned vehicles. Journal of Intelligent and Robotic Systems, 67(1), 61–73.
Article MATH Google Scholar
Fault-detection, Fault-isolation and recovery (FDIR) techniques. Johnson Space Center (NASA), Tech. DFE-7, (1994).
Soltesz, S., P¨otzl, H., Fiuczynski, M. E., Bavier, A., & Peterson, L. (2007). Container-based operating system virtualization: A scalable, high-performance alternative to hypervisors. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007 (EuroSys`07), (pp. 275–287).
Kyriazis, D., Anagnostopoulos, V., Arcangeli, A., Gilbert, D., Kalogeras, D., Kat, R., Klein, C., Kokkinos, P., Kuperman, Y., Nider, J., Svärd, P., Tomas, L., Varvarigos, E., & Varvarigou, T. (2015). High performance fault-tolerance for clouds. In Proceedings of the IEEE Symposium on Computers and Communication (ISCC`15), (pp. 251–257).
Wang, J., Zhu, X., & Bao, W. (2013). Real-time fault-tolerant scheduling based on primary–backup approach in virtualized clouds. In Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications and IEEE International Conference on Embedded and Ubiquitous Computing (HPCC EUC`13), (pp. 1127–1134).
Jiang, G., Chen, H., Yoshihira, K., & Saxena, A. (2011). Ranking the importance of alerts for problem determination in large computer systems. Cluster Computing, 14(3), 213–227.
Article Google Scholar
Merkel, D. (2014). Docker: Lightweight Linux containers for consistent development and deployment. Linux Journal, 2014(239), 2.
Google Scholar
Jia, W., & Zhou, W. (2006). Distributed network systems: From concepts to implementations, ser. network theory and applications. New York: Springer.
Google Scholar
Zheng, W., Xu, P., Huang, X., & Wu, N. (2010). Design a cloud storage platform for pervasive computing environments. Cluster Computing, 13(2), 141–151.
Article Google Scholar
Zheng, Q., Veeravalli, B., & Tham, C. K. (2009). On the design of fault-tolerant scheduling strategies using primary-backup approach for computational grids with low replication costs. IEEE Transactions on Computers, 58(3), 380–393.
Article MathSciNet MATH Google Scholar
Luo, W., Qin, X., Tan, X. C., Qin, K., & Manzanares, A. (2009). Exploiting redundancies to enhance schedulability in fault-tolerant and real-time distributed systems. IEEE Transactions on Systems, Man, and Cybernetics—Part A: Systems and Humans, 39(3), 626–639.
Article Google Scholar
Ko, W., Yoo, J., Kang, I., Jun, J., & Lim, S. S. (2016). Lightweight, predictable hypervisor for ARM-Based embedded systems. In Proceedings of the IEEE 22nd International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA`16), (p. 109).
Li, N., Kinebuchi, Y., Mitake, H., Shimada, H., Lin, T., & Nakajima, T. (2012). A light-weighted virtualization layer for multicore processor-based rich functional embedded systems. In Proceedings of the IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing (ISORC`12), (pp. 144–153).
Yoo, J. (2016). The design and implementation of fault tolerant PSTR on the embedded virtualization system. In Proceedings of the World Congress on Engineering and Computer Science (WCECS`16), (pp. 145–149).
Checconi, F., Cucinotta, T., & Stein, M. (2010). Real-time issues in live migration of virtual machines. In Proceedings of the International Conference on Parallel Processing (Euro-Par`09), (pp. 454–466).
Kim, D., Machida, F., & Trivedi, K. (2009). Availability modeling and analysis of a virtualized system. In Proceedings of the IEEE Pacific Rim International Symposium on Dependable Computing (PRDC`09), (pp. 365–371).
Groesbrink, S. (2014). Virtual machine migration as a fault tolerance technique for embedded real-time systems. In Proceedings of the IEEE International Conference on Software Security and Reliability-Companion (SERE-C`14), (pp. 7–12).
Dhouib, S., Kchir, S., Stinckwich, S., Ziadi, T., & Ziane, M. (2012). RobotML, a domain-specific language to design, simulate and deploy robotic applications. In Proceedings of the Third International Conference on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR`12), (pp. 149–160).
Dhillon, B. (2012). Robot reliability and safety. New York: Springer.
Google Scholar
Hammadi, M., Choley, M., Ben Said, A., Kellner, A., & Hehenberger, P. (2016). Systems engineering analysis approach based on interoperability for reconfigurable manufacturing systems. In Proceedings of the IEEE International Symposium on Systems Engineering (ISSE`16), (pp. 1–6).
Zhu, X., Wang, J., Guo, H., Zhu, D., Yang, L. T., & Liu, L. (2016). Fault-tolerant scheduling for real-time scientific workflows with elastic resource provisioning in virtualized clouds. IEEE Transactions on Parallel and Distributed Systems, 27(12), 3501–3517.
Article Google Scholar
Stanclif, S., Dolan, J., & Trebi-Ollennu, A. (2009). Planning to fail—reliability as a design parameter for planetary rover missions. In Proceedings of the Carnegie Mellon University Research Showcase Robotics Institute, (pp. 2–6).
Sommerville, I. (2010). Software engineering (9th ed.). Boston: Addison Wesley.
MATH Google Scholar
Bassil, Y. (2012). A simulation model for the waterfall software development life cycle. International Journal of Engineering and Technology, 2(5), 742–749.
Google Scholar
Stellman, A., & Greene, J. (2005). Applied software project management. Sebastopol: O’Reilly Media.
Google Scholar
Cappos, J., Baker, S., Plichta, J., Nyugen, D., Hardies, J., Borgard, M., Johnston, J., & Hartman, J. H. (2007). Stork: Package management for distributed VM environments. In Proceedings of the 21st Conference on Large Installation System Administration Conference (LISA`07), (pp. 7:1–7:16).
Tucker, C., Shuffelton, D., Jhala, R., & Lerner, S. (2007). OPIUM: Optimal package install/uninstall manager. In Proceedings of the 29th International Conference on Software Engineering (ICSE`07), (pp. 178–188).
Gerkey, B., & Conley, K. (2011). Robot developer kits. IEEE Robotics and Automation Magazine, 18(3), 16.
Article Google Scholar
Smith, J. E., & Nair, R. (2005). The architecture of virtual machines. Computer, 38(5), 32–38.
Article Google Scholar
Youseff, L., Seymour, K., You, H., Zagorodnov, D., Dongarra, J., & Wolski, R. (2009). Paravirtualization effect on single- and multi-threaded memory-intensive linear algebra software. Cluster Computing, 12(2), 101–122.
Article Google Scholar
Bernstein, D. (2014). Containers and cloud: From LXC to Docker to Kubernetes. IEEE Cloud Computing, 1(3), 81–84.
Article Google Scholar
Felter, W., Ferreira, A., Rajamony, R., & Rubio, J. (2014). An updated performance comparison of virtual machines and linux containers. IBM Research Division Austin Research Laboratory, RC25482 (AUS1407-001).
Higginbotham, S. (2015). Why Facebook’s parse news is a big deal for the internet of things. (Online). http://fortune.com/2015/03/25/facebook-parse-internet-of-things/.
Bahl, P., Han, R. Y., Li, L. E., & Satyanarayanan, M. (2012). Advancing the state of mobile cloud computing. In Proceedings of 3rd ACM Workshop on Mobile Cloud Computing and Services (MCS`12), New York, NY, USA: ACM, (pp. 21–28).

Download references

Acknowledgements

This work was supported by the research fund of Hanyang University (HY-2014-N). W.-J. Lee is the co-corresponding author of this paper.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Hanyang University, Ansan, Korea
Jaemyoun Lee, Haegeon Jeong & Kyungtae Kang
Department of Computer Science, Inha Technical College, Incheon, Korea
Won-Joo Lee
School of Computer Science and Information Engineering, The Catholic University of Korea, Bucheon, Korea
Hyo-Joong Suh
Department of Computer Science and Information Systems, Texas A&M University–Commerce, Commerce, TX, USA
Dongeun Lee

Authors

Jaemyoun Lee
View author publications
You can also search for this author in PubMed Google Scholar
Haegeon Jeong
View author publications
You can also search for this author in PubMed Google Scholar
Won-Joo Lee
View author publications
You can also search for this author in PubMed Google Scholar
Hyo-Joong Suh
View author publications
You can also search for this author in PubMed Google Scholar
Dongeun Lee
View author publications
You can also search for this author in PubMed Google Scholar
Kyungtae Kang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kyungtae Kang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, J., Jeong, H., Lee, WJ. et al. Advanced Primary–Backup Platform with Container-Based Automatic Deployment for Fault-Tolerant Systems. Wireless Pers Commun 98, 3177–3194 (2018). https://doi.org/10.1007/s11277-017-4282-4

Download citation

Published: 27 April 2017
Issue Date: February 2018
DOI: https://doi.org/10.1007/s11277-017-4282-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Advanced Primary–Backup Platform with Container-Based Automatic Deployment for Fault-Tolerant Systems

Abstract

Access this article

Similar content being viewed by others

A Controller Safety Concept Based on Software-Implemented Fault Tolerance for Fail-Operational Automotive Applications

New Distribution Paradigms for Railway Interlocking

Unknown Threats and Provisions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Advanced Primary–Backup Platform with Container-Based Automatic Deployment for Fault-Tolerant Systems

Abstract

Access this article

Similar content being viewed by others

A Controller Safety Concept Based on Software-Implemented Fault Tolerance for Fail-Operational Automotive Applications

New Distribution Paradigms for Railway Interlocking

Unknown Threats and Provisions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation