Skip to main content
Log in

Advanced Primary–Backup Platform with Container-Based Automatic Deployment for Fault-Tolerant Systems

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

Within mission-critical systems, the primary–backup scheme is a desirable approach for improving reliability and fault tolerance. It can be used to ensure a high mission success rate despite unexpected errors. However, it must cope with the need to maintain consistency between a primary and a backup whenever the primary encounters unexpected errors. We overcome this issue by introducing a platform that uses container-based light virtualization and an automatic build system to isolate an application so that it may then be deployed on different devices without manual intervention. We believe an advanced deployment procedure can retain the consistency of the primary–backup systems with low implementation complexity. Integrated with a cloud application, it can also manage mission-critical systems effectively, communicate with the redundant systems, and detect unexpected errors by using sophisticated fault-detection technologies. We demonstrate that the platform can improve the reliability of mission-critical systems through realistic experiment using a model electronic vehicle and can reduce hardware dependencies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Zhang, Y., Chamseddine, A., Rabbath, C., Gordon, B., Su, C.-Y., Rakheja, S., et al. (2013). Development of advanced FDD and FTC techniques with application to an unmanned quadrotor helicopter testbed. Journal of the Franklin Institute, 350(9), 2396–2422.

    Article  MATH  Google Scholar 

  2. Saied, M., Lussier, B., Fantoni, I., Francis, C., & Shraim, H. (2015). Fault tolerant control for multiple successive failures in an octorotor: Architecture and experiments. In Proceedidings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS`15), (pp. 40–45).

  3. Park, J., Lee, S., Yoon, T., & Kim, J. (2015). An autonomic control system for high-reliable CPS. Cluster Computing, 18(2), 587–598.

    Article  Google Scholar 

  4. Asikin, D., & Dolan, J. M. (2010). Reliability impact on planetary robotic missions. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS`10), (pp. 4095–4100).

  5. Freddi, A., Longhi, S., & Monteriu, A. (2012). A diagnostic Thau observer for a class of unmanned vehicles. Journal of Intelligent and Robotic Systems, 67(1), 61–73.

    Article  MATH  Google Scholar 

  6. Fault-detection, Fault-isolation and recovery (FDIR) techniques. Johnson Space Center (NASA), Tech. DFE-7, (1994).

  7. Soltesz, S., P¨otzl, H., Fiuczynski, M. E., Bavier, A., & Peterson, L. (2007). Container-based operating system virtualization: A scalable, high-performance alternative to hypervisors. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007 (EuroSys`07), (pp. 275–287).

  8. Kyriazis, D., Anagnostopoulos, V., Arcangeli, A., Gilbert, D., Kalogeras, D., Kat, R., Klein, C., Kokkinos, P., Kuperman, Y., Nider, J., Svärd, P., Tomas, L., Varvarigos, E., & Varvarigou, T. (2015). High performance fault-tolerance for clouds. In Proceedings of the IEEE Symposium on Computers and Communication (ISCC`15), (pp. 251–257).

  9. Wang, J., Zhu, X., & Bao, W. (2013). Real-time fault-tolerant scheduling based on primary–backup approach in virtualized clouds. In Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications and IEEE International Conference on Embedded and Ubiquitous Computing (HPCC EUC`13), (pp. 1127–1134).

  10. Jiang, G., Chen, H., Yoshihira, K., & Saxena, A. (2011). Ranking the importance of alerts for problem determination in large computer systems. Cluster Computing, 14(3), 213–227.

    Article  Google Scholar 

  11. Merkel, D. (2014). Docker: Lightweight Linux containers for consistent development and deployment. Linux Journal, 2014(239), 2.

    Google Scholar 

  12. Jia, W., & Zhou, W. (2006). Distributed network systems: From concepts to implementations, ser. network theory and applications. New York: Springer.

    Google Scholar 

  13. Zheng, W., Xu, P., Huang, X., & Wu, N. (2010). Design a cloud storage platform for pervasive computing environments. Cluster Computing, 13(2), 141–151.

    Article  Google Scholar 

  14. Zheng, Q., Veeravalli, B., & Tham, C. K. (2009). On the design of fault-tolerant scheduling strategies using primary-backup approach for computational grids with low replication costs. IEEE Transactions on Computers, 58(3), 380–393.

    Article  MathSciNet  MATH  Google Scholar 

  15. Luo, W., Qin, X., Tan, X. C., Qin, K., & Manzanares, A. (2009). Exploiting redundancies to enhance schedulability in fault-tolerant and real-time distributed systems. IEEE Transactions on Systems, Man, and Cybernetics—Part A: Systems and Humans, 39(3), 626–639.

    Article  Google Scholar 

  16. Ko, W., Yoo, J., Kang, I., Jun, J., & Lim, S. S. (2016). Lightweight, predictable hypervisor for ARM-Based embedded systems. In Proceedings of the IEEE 22nd International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA`16), (p. 109).

  17. Li, N., Kinebuchi, Y., Mitake, H., Shimada, H., Lin, T., & Nakajima, T. (2012). A light-weighted virtualization layer for multicore processor-based rich functional embedded systems. In Proceedings of the IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing (ISORC`12), (pp. 144–153).

  18. Yoo, J. (2016). The design and implementation of fault tolerant PSTR on the embedded virtualization system. In Proceedings of the World Congress on Engineering and Computer Science (WCECS`16), (pp. 145–149).

  19. Checconi, F., Cucinotta, T., & Stein, M. (2010). Real-time issues in live migration of virtual machines. In Proceedings of the International Conference on Parallel Processing (Euro-Par`09), (pp. 454–466).

  20. Kim, D., Machida, F., & Trivedi, K. (2009). Availability modeling and analysis of a virtualized system. In Proceedings of the IEEE Pacific Rim International Symposium on Dependable Computing (PRDC`09), (pp. 365–371).

  21. Groesbrink, S. (2014). Virtual machine migration as a fault tolerance technique for embedded real-time systems. In Proceedings of the IEEE International Conference on Software Security and Reliability-Companion (SERE-C`14), (pp. 7–12).

  22. Dhouib, S., Kchir, S., Stinckwich, S., Ziadi, T., & Ziane, M. (2012). RobotML, a domain-specific language to design, simulate and deploy robotic applications. In Proceedings of the Third International Conference on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR`12), (pp. 149–160).

  23. Dhillon, B. (2012). Robot reliability and safety. New York: Springer.

    Google Scholar 

  24. Hammadi, M., Choley, M., Ben Said, A., Kellner, A., & Hehenberger, P. (2016). Systems engineering analysis approach based on interoperability for reconfigurable manufacturing systems. In Proceedings of the IEEE International Symposium on Systems Engineering (ISSE`16), (pp. 1–6).

  25. Zhu, X., Wang, J., Guo, H., Zhu, D., Yang, L. T., & Liu, L. (2016). Fault-tolerant scheduling for real-time scientific workflows with elastic resource provisioning in virtualized clouds. IEEE Transactions on Parallel and Distributed Systems, 27(12), 3501–3517.

    Article  Google Scholar 

  26. Stanclif, S., Dolan, J., & Trebi-Ollennu, A. (2009). Planning to fail—reliability as a design parameter for planetary rover missions. In Proceedings of the Carnegie Mellon University Research Showcase Robotics Institute, (pp. 2–6).

  27. Sommerville, I. (2010). Software engineering (9th ed.). Boston: Addison Wesley.

    MATH  Google Scholar 

  28. Bassil, Y. (2012). A simulation model for the waterfall software development life cycle. International Journal of Engineering and Technology, 2(5), 742–749.

    Google Scholar 

  29. Stellman, A., & Greene, J. (2005). Applied software project management. Sebastopol: O’Reilly Media.

    Google Scholar 

  30. Cappos, J., Baker, S., Plichta, J., Nyugen, D., Hardies, J., Borgard, M., Johnston, J., & Hartman, J. H. (2007). Stork: Package management for distributed VM environments. In Proceedings of the 21st Conference on Large Installation System Administration Conference (LISA`07), (pp. 7:1–7:16).

  31. Tucker, C., Shuffelton, D., Jhala, R., & Lerner, S. (2007). OPIUM: Optimal package install/uninstall manager. In Proceedings of the 29th International Conference on Software Engineering (ICSE`07), (pp. 178–188).

  32. Gerkey, B., & Conley, K. (2011). Robot developer kits. IEEE Robotics and Automation Magazine, 18(3), 16.

    Article  Google Scholar 

  33. Smith, J. E., & Nair, R. (2005). The architecture of virtual machines. Computer, 38(5), 32–38.

    Article  Google Scholar 

  34. Youseff, L., Seymour, K., You, H., Zagorodnov, D., Dongarra, J., & Wolski, R. (2009). Paravirtualization effect on single- and multi-threaded memory-intensive linear algebra software. Cluster Computing, 12(2), 101–122.

    Article  Google Scholar 

  35. Bernstein, D. (2014). Containers and cloud: From LXC to Docker to Kubernetes. IEEE Cloud Computing, 1(3), 81–84.

    Article  Google Scholar 

  36. Felter, W., Ferreira, A., Rajamony, R., & Rubio, J. (2014). An updated performance comparison of virtual machines and linux containers. IBM Research Division Austin Research Laboratory, RC25482 (AUS1407-001).

  37. Higginbotham, S. (2015). Why Facebook’s parse news is a big deal for the internet of things. (Online). http://fortune.com/2015/03/25/facebook-parse-internet-of-things/.

  38. Bahl, P., Han, R. Y., Li, L. E., & Satyanarayanan, M. (2012). Advancing the state of mobile cloud computing. In Proceedings of 3rd ACM Workshop on Mobile Cloud Computing and Services (MCS`12), New York, NY, USA: ACM, (pp. 21–28).

Download references

Acknowledgements

This work was supported by the research fund of Hanyang University (HY-2014-N). W.-J. Lee is the co-corresponding author of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kyungtae Kang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, J., Jeong, H., Lee, WJ. et al. Advanced Primary–Backup Platform with Container-Based Automatic Deployment for Fault-Tolerant Systems. Wireless Pers Commun 98, 3177–3194 (2018). https://doi.org/10.1007/s11277-017-4282-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-017-4282-4

Keywords

Navigation