Abstract
It is very important to quickly solve system failures in a system operation. Some studies have proposed fault tolerance systems such as a flexible system architecture for dealing with system failures and automatic failure detection system. However, human identifies a system failure in many cases, and a support system to reduce the cost of trial and error for solving system failures is required. In this study, we propose an architecture for system recovery based on solution records on different servers. In the experiment using prototype, we confirm the feasibility of the proposed system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amazon Web Services. https://aws.amazon.com/. Accessed Jan 2019
Google Cloud Platform. https://cloud.google.com/. Accessed Jan 2019
Datadog. https://www.datadoghq.com/. Accessed Jan 2019
Holub, V., et al.: Run-time correlation engine for system monitoring and testing. In: Proceedings of the 6th International Conference Industry Session on Autonomic Computing and Communications Industry Session, pp. 9–18 (2009)
Wang, M., et al.: Scalable run-time correlation engine for monitoring in a cloud computing environment. In: IEEE International Conference on the Engineering of Computer-Based Systems (ECBS), pp. 29–38 (2010)
Xu, W., et al.: Online system problem detection by mining patterns of console logs. In: Proceedings of Ninth IEEE International Conference on Data Mining, ICDM 2009, pp. 588–597 (2009)
Xu, W., et al.: Detecting large-scale system problems by mining console logs. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 37–46 (2010)
Mirgorodskiy, A.V., et al.: Problem diagnosis in large-scale computing environments. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, p. 88 (2006)
Diao, Y., et al.: Rule-based problem classification in it service management. In: 2009 IEEE International Conference on Cloud Computing, pp. 221–228 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Kasai, T., Takano, K. (2020). An Architecture for System Recovery Based on Solution Records on Different Servers. In: Barolli, L., Hellinckx, P., Natwichai, J. (eds) Advances on P2P, Parallel, Grid, Cloud and Internet Computing. 3PGCIC 2019. Lecture Notes in Networks and Systems, vol 96. Springer, Cham. https://doi.org/10.1007/978-3-030-33509-0_85
Download citation
DOI: https://doi.org/10.1007/978-3-030-33509-0_85
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33508-3
Online ISBN: 978-3-030-33509-0
eBook Packages: EngineeringEngineering (R0)